I keep hearing the statement on most programming related sites:
Program to an interface and not to an Implementation
However I don't understand the implications?
Examples would help.
EDIT: I have received a lot of good answers even so could you'll supplement it with some snippets of code for a better understanding of the subject. Thanks!
You are probably looking for something like this:
public static void main(String... args) {
// do this - declare the variable to be of type Set, which is an interface
Set buddies = new HashSet();
// don't do this - you declare the variable to have a fixed type
HashSet buddies2 = new HashSet();
}
Why is it considered good to do it the first way? Let's say later on you decide you need to use a different data structure, say a LinkedHashSet, in order to take advantage of the LinkedHashSet's functionality. The code has to be changed like so:
public static void main(String... args) {
// do this - declare the variable to be of type Set, which is an interface
Set buddies = new LinkedHashSet(); // <- change the constructor call
// don't do this - you declare the variable to have a fixed type
// this you have to change both the variable type and the constructor call
// HashSet buddies2 = new HashSet(); // old version
LinkedHashSet buddies2 = new LinkedHashSet();
}
This doesn't seem so bad, right? But what if you wrote getters the same way?
public HashSet getBuddies() {
return buddies;
}
This would have to be changed, too!
public LinkedHashSet getBuddies() {
return buddies;
}
Hopefully you see, even with a small program like this you have far-reaching implications on what you declare the type of the variable to be. With objects going back and forth so much it definitely helps make the program easier to code and maintain if you just rely on a variable being declared as an interface, not as a specific implementation of that interface (in this case, declare it to be a Set, not a LinkedHashSet or whatever). It can be just this:
public Set getBuddies() {
return buddies;
}
There's another benefit too, in that (well at least for me) the difference helps me design a program better. But hopefully my examples give you some idea... hope it helps.
One day, a junior programmer was instructed by his boss to write an application to analyze business data and condense it all in pretty reports with metrics, graphs and all that stuff. The boss gave him an XML file with the remark "here's some example business data".
The programmer started coding. A few weeks later he felt that the metrics and graphs and stuff were pretty enough to satisfy the boss, and he presented his work. "That's great" said the boss, "but can it also show business data from this SQL database we have?".
The programmer went back to coding. There was code for reading business data from XML sprinkled throughout his application. He rewrote all those snippets, wrapping them with an "if" condition:
if (dataType == "XML")
{
... read a piece of XML data ...
}
else
{
.. query something from the SQL database ...
}
When presented with the new iteration of the software, the boss replied: "That's great, but can it also report on business data from this web service?" Remembering all those tedious if statements he would have to rewrite AGAIN, the programmer became enraged. "First xml, then SQL, now web services! What is the REAL source of business data?"
The boss replied: "Anything that can provide it"
At that moment, the programmer was enlightened.
An interface defines the methods an object is commited to respond.
When you code to the interface, you can change the underlying object and your code will still work ( because your code is agnostic of WHO do perform the job or HOW the job is performed ) You gain flexibility this way.
When you code to a particular implementation, if you need to change the underlying object your code will most likely break, because the new object may not respond to the same methods.
So to put a clear example:
If you need to hold a number of objects you might have decided to use a Vector.
If you need to access the first object of the Vector you could write:
Vector items = new Vector();
// fill it
Object first = items.firstElement();
So far so good.
Later you decided that because for "some" reason you need to change the implementation ( let's say the Vector creates a bottleneck due to excessive synchronization)
You realize you need to use an ArrayList instad.
Well, you code will break ...
ArrayList items = new ArrayList();
// fill it
Object first = items.firstElement(); // compile time error.
You can't. This line and all those line who use the firstElement() method would break.
If you need specific behavior and you definitely need this method, it might be ok ( although you won't be able to change the implementation ) But if what you need is to simply retrieve the first element ( that is , there is nothing special with the Vector other that it has the firstElement() method ) then using the interface rather than the implementation would give you the flexibility to change.
List items = new Vector();
// fill it
Object first = items.get( 0 ); //
In this form you are not coding to the get method of Vector, but to the get method of List.
It does not matter how do the underlying object performs the method, as long as it respond to the contract of "get the 0th element of the collection"
This way you may later change it to any other implementation:
List items = new ArrayList(); // Or LinkedList or any other who implements List
// fill it
Object first = items.get( 0 ); // Doesn't break
This sample might look naive, but is the base on which OO technology is based ( even on those language which are not statically typed like Python, Ruby, Smalltalk, Objective-C etc )
A more complex example is the way JDBC works. You can change the driver, but most of your call will work the same way. For instance you could use the standard driver for oracle databases or you could use one more sophisticated like the ones Weblogic or Webpshere provide . Of course it isn't magical you still have to test your product before, but at least you don't have stuff like:
statement.executeOracle9iSomething();
vs
statement.executeOracle11gSomething();
Something similar happens with Java Swing.
Additional reading:
Design Principles from Design Patterns
Effective Java Item: Refer to objects by their interfaces
( Buying this book the one of the best things you could do in life - and read if of course - )
My initial read of that statement is very different than any answer I've read yet. I agree with all the people that say using interface types for your method params, etc are very important, but that's not what this statement means to me.
My take is that it's telling you to write code that only depends on what the interface (in this case, I'm using "interface" to mean exposed methods of either a class or interface type) you're using says it does in the documentation. This is the opposite of writing code that depends on the implementation details of the functions you're calling. You should treat all function calls as black boxes (you can make exceptions to this if both functions are methods of the same class, but ideally it is maintained at all times).
Example: suppose there is a Screen class that has Draw(image) and Clear() methods on it. The documentation says something like "the draw method draws the specified image on the screen" and "the clear method clears the screen". If you wanted to display images sequentially, the correct way to do so would be to repeatedly call Clear() followed by Draw(). That would be coding to the interface. If you're coding to the implementation, you might do something like only calling the Draw() method because you know from looking at the implementation of Draw() that it internally calls Clear() before doing any drawing. This is bad because you're now dependent on implementation details that you can't know from looking at the exposed interface.
I look forward to seeing if anyone else shares this interpretation of the phrase in the OP's question, or if I'm entirely off base...
It's a way to separate responsibilities / dependancies between modules.
By defining a particular Interface (an API), you ensure that the modules on either side of the interface won't "bother" one another.
For example, say module 1 will take care of displaying bank account info for a particular user, and module2 will fetch bank account info from "whatever" back-end is used.
By defining a few types and functions, along with the associated parameters, for example a structure defining a bank transaction, and a few methods (functions) like GetLastTransactions(AccountNumber, NbTransactionsWanted, ArrayToReturnTheseRec) and GetBalance(AccountNumer), the Module1 will be able to get the needed info, and not worry about how this info is stored or calculated or whatever. Conversely, the Module2 will just respond to the methods call by providing the info as per the defined interface, but won't worry about where this info is to be displayed, printed or whatever...
When a module is changed, the implementation of the interface may vary, but as long as the interface remains the same, the modules using the API may at worst need to be recompiled/rebuilt, but they do not need to have their logic modified in anyway.
That's the idea of an API.
At its core, this statement is really about dependencies. If I code my class Foo to an implementation (Bar instead of IBar) then Foo is now dependent on Bar. But if I code my class Foo to an interface (IBar instead of Bar) then the implementation can vary and Foo is no longer dependent on a specific implementation. This approach gives a flexible, loosely-coupled code base that is more easily reused, refactored and unit tested.
Take a red 2x4 Lego block and attach it to a blue 2x4 Lego block so one sits atop the other. Now remove the blue block and replace it with a yellow 2x4 Lego block. Notice that the red block did not have to change even though the "implementation" of the attached block varied.
Now go get some other kind of block that does not share the Lego "interface". Try to attach it to the red 2x4 Lego. To make this happen, you will need to change either the Lego or the other block, perhaps by cutting away some plastic or adding new plastic or glue. Notice that by varying the "implementation" you are forced to change it or the client.
Being able to let implementations vary without changing the client or the server - that is what it means to program to interfaces.
An interface is like a contract between you and the person who made the interface that your code will carry out what they request. Furthermore, you want to code things in such a way that your solution can solve the problem many times over. Think code re-use. When you are coding to an implementation, you are thinking purely of the instance of a problem that you are trying to solve. So when under this influence, your solutions will be less generic and more focused. That will make writing a general solution that abides by an interface much more challenging.
Look, I didn't realize this was for Java, and my code is based on C#, but I believe it provides the point.
Every car have doors.
But not every door act the same, like in UK the taxi doors are backwards. One universal fact is that they "Open" and "Close".
interface IDoor
{
void Open();
void Close();
}
class BackwardDoor : IDoor
{
public void Open()
{
// code to make the door open the "wrong way".
}
public void Close()
{
// code to make the door close properly.
}
}
class RegularDoor : IDoor
{
public void Open()
{
// code to make the door open the "proper way"
}
public void Close()
{
// code to make the door close properly.
}
}
class RedUkTaxiDoor : BackwardDoor
{
public Color Color
{
get
{
return Color.Red;
}
}
}
If you are a car door repairer, you dont care how the door looks, or if it opens one way or the other way. Your only requirement is that the door acts like a door, such as IDoor.
class DoorRepairer
{
public void Repair(IDoor door)
{
door.Open();
// Do stuff inside the car.
door.Close();
}
}
The Repairer can handle RedUkTaxiDoor, RegularDoor and BackwardDoor. And any other type of doors, such as truck doors, limousine doors.
DoorRepairer repairer = new DoorRepairer();
repairer.Repair( new RegularDoor() );
repairer.Repair( new BackwardDoor() );
repairer.Repair( new RedUkTaxiDoor() );
Apply this for lists, you have LinkedList, Stack, Queue, the normal List, and if you want your own, MyList. They all implement the IList interface, which requires them to implement Add and Remove. So if your class add or remove items in any given list...
class ListAdder
{
public void PopulateWithSomething(IList list)
{
list.Add("one");
list.Add("two");
}
}
Stack stack = new Stack();
Queue queue = new Queue();
ListAdder la = new ListAdder()
la.PopulateWithSomething(stack);
la.PopulateWithSomething(queue);
Allen Holub wrote a great article for JavaWorld in 2003 on this topic called Why extends is evil. His take on the "program to the interface" statement, as you can gather from his title, is that you should happily implement interfaces, but very rarely use the extends keyword to subclass. He points to, among other things, what is known as the fragile base-class problem. From Wikipedia:
a fundamental architectural problem of object-oriented programming systems where base classes (superclasses) are considered "fragile" because seemingly safe modifications to a base class, when inherited by the derived classes, may cause the derived classes to malfunction. The programmer cannot determine whether a base class change is safe simply by examining in isolation the methods of the base class.
In addition to the other answers, I add more:
You program to an interface because it's easier to handle. The interface encapsulates the behavior of the underlying class. This way, the class is a blackbox. Your whole real life is programming to an interface. When you use a tv, a car, a stereo, you are acting on its interface, not on its implementation details, and you assume that if implementation changes (e.g. diesel engine or gas) the interface remains the same. Programming to an interface allows you to preserve your behavior when non-disruptive details are changed, optimized, or fixed. This simplifies also the task of documenting, learning, and using.
Also, programming to an interface allows you to delineate what is the behavior of your code before even writing it. You expect a class to do something. You can test this something even before you write the actual code that does it. When your interface is clean and done, and you like interacting with it, you can write the actual code that does things.
"Program to an interface" can be more flexible.
For example, we are writing a class Printer which provides print service. currently there are 2 class (Cat and Dog) need to be printed. So we write code like below
class Printer
{
public void PrintCat(Cat cat)
{
...
}
public void PrintDog(Dog dog)
{
...
}
...
}
How about if there is a new class Bird also needs this print service? We have to change Printer class to add a new method PrintBird(). In real case, when we develop Printer class, we may have no idea about who will use it. So how to write Printer? Program to an interface can help, see below code
class Printer
{
public void Print(Printable p)
{
Bitmap bitmap = p.GetBitmap();
// print bitmap ...
}
}
With this new Printer, everything can be printed as long as it implements Interface Printable. Here method GetBitmap() is just a example. The key thing is to expose an Interface not a implementation.
Hope it's helpful.
Essentially, interfaces are the slightly more concrete representation of general concepts of interoperation - they provide the specification for what all the various options you might care to "plug in" for a particular function should do similarly so that code which uses them won't be dependent on one particular option.
For instance, many DB libraries act as interfaces in that they can operate with many different actual DBs (MSSQL, MySQL, PostgreSQL, SQLite, etc.) without the code that uses the DB library having to change at all.
Overall, it allows you to create code that's more flexible - giving your clients more options on how they use it, and also potentially allowing you to more easily reuse code in multiple places instead of having to write new specialized code.
By programming to an interface, you are more likely to apply the low coupling / high cohesion principle.
By programming to an interface, you can easily switch the implementation of that interface (the specific class).
It means that your variables, properties, parameters and return types should have an interface type instead of a concrete implementation.
Which means you use IEnumerable<T> Foo(IList mylist) instead of ArrayList Foo(ArrayList myList) for example.
Use the implementation only when constructing the object:
IList list = new ArrayList();
If you have done this you can later change the object type maybe you want to use LinkedList instead of ArrayList later on, this is no problem since everywhere else you refer to it as just "IList"
It's basically where you make a method/interface like this: create( 'apple' ) where the method create(param) comes from an abstract class/interface fruit that is later implemented by concrete classes. This is different than subclassing. You are creating a contract that classes must fulfill. This also reduces coupling and making things more flexible where each concrete class implements it differently.
The client code remains unaware of the specific types of objects used and remains unaware of the classes that implement these objects. Client code only knows about the interface create(param) and it uses it to make fruit objects. It's like saying, "I don't care how you get it or make it I, just want you to give it to me."
An analogy to this is a set of on and off buttons. That is an interface on() and off(). You can use these buttons on several devices, a TV, radio, light. They all handle them differently but we don't care about that, all we care about is to turn it on or turn it off.
Coding to an interface is a philosophy, rather than specific language constructs or design patterns - it instructs you what is the correct order of steps to follow in order to create better software systems (e.g. more resilient, more testable, more scalable, more extendible, and other nice traits).
What it actually means is:
===
Before jumping to implementations and coding (the HOW) - think of the WHAT:
What black boxes should make up your system,
What is each box' responsibility,
What are the ways each "client" (that is, one of those other boxes, 3rd party "boxes", or even humans) should communicate with it (the API of each box).
After you figure the above, go ahead and implement those boxes (the HOW).
Thinking first of what a box' is and what its API, leads the developer to distil the box' responsibility, and to mark for himself and future developers the difference between what is its exposed details ("API") and it's hidden details ("implementation details"), which is a very important differentiation to have.
One immediate and easily noticeable gain is the team can then change and improve implementations without affecting the general architecture. It also makes the system MUCH more testable (it goes well with the TDD approach).
===
Beyond the traits I've mentioned above, you also save A LOT OF TIME going this direction.
Micro Services and DDD, when done right, are great examples of "Coding to an interface", however the concept wins in every pattern from monoliths to "serverless", from BE to FE, from OOP to functional, etc....
I strongly recommend this approach for Software Engineering (and I basically believe it makes total sense in other fields as well).
Related
when programming in Java I practically always, just out of habit, write something like this:
public List<String> foo() {
return new ArrayList<String>();
}
Most of the time without even thinking about it. Now, the question is: should I always specify the interface as the return type? Or is it advisable to use the actual implementation of the interface, and if so, under what circumstances?
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList:
List bar = foo();
List myList = bar instanceof LinkedList ? new ArrayList(bar) : bar;
but that just seems horrible and my coworkers would probably lynch me in the cafeteria. And rightfully so.
What do you guys think? What are your guidelines, when do you tend towards the abstract solution, and when do you reveal details of your implementation for potential performance gains?
Return the appropriate interface to hide implementation details. Your clients should only care about what your object offers, not how you implemented it. If you start with a private ArrayList, and decide later on that something else (e.g., LinkedLisk, skip list, etc.) is more appropriate you can change the implementation without affecting clients if you return the interface. The moment you return a concrete type the opportunity is lost.
For instance, if I know that I will
primarily access the data in the list
randomly, a LinkedList would be bad.
But if my library function only
returns the interface, I simply don't
know. To be on the safe side I might
even need to copy the list explicitly
over to an ArrayList.
As everybody else has mentioned, you just mustn't care about how the library has implemented the functionality, to reduce coupling and increasing maintainability of the library.
If you, as a library client, can demonstrate that the implementation is performing badly for your use case, you can then contact the person in charge and discuss about the best path to follow (a new method for this case or just changing the implementation).
That said, your example reeks of premature optimization.
If the method is or can be critical, it might mention the implementation details in the documentation.
Without being able to justify it with reams of CS quotes (I'm self taught), I've always gone by the mantra of "Accept the least derived, return the most derived," when designing classes and it has stood me well over the years.
I guess that means in terms of interface versus concrete return is that if you are trying to reduce dependencies and/or decouple, returning the interface is generally more useful. However, if the concrete class implements more than that interface, it is usually more useful to the callers of your method to get the concrete class back (i.e. the "most derived") rather than aribtrarily restrict them to a subset of that returned object's functionality - unless you actually need to restrict them. Then again, you could also just increase the coverage of the interface. Needless restrictions like this I compare to thoughtless sealing of classes; you never know. Just to talk a bit about the former part of that mantra (for other readers), accepting the least derived also gives maximum flexibility for callers of your method.
-Oisin
Sorry to disagree, but I think the basic rule is as follows:
For input arguments use the most generic.
For output values, the most specific.
So, in this case you want to declare the implementation as:
public ArrayList<String> foo() {
return new ArrayList<String>();
}
Rationale:
The input case is already known and explained by everyone: use the interface, period. However, the output case can look counter-intuitive.
You want to return the implementation because you want the client to have the most information about what is receiving. In this case, more knowledge is more power.
Example 1: the client wants to get the 5th element:
return Collection: must iterate until 5th element vs return List:
return List: list.get(4)
Example 2: the client wants to remove the 5th element:
return List: must create a new list without the specified element (list.remove() is optional).
return ArrayList: arrayList.remove(4)
So it's a big truth that using interfaces is great because it promotes reusability, reduces coupling, improves maintainability and makes people happy ... but only when used as input.
So, again, the rule can be stated as:
Be flexible for what you offer.
Be informative with what you deliver.
So, next time, please return the implementation.
In OO programming, we want to encapsulate as much as possible the data. Hide as much as possible the actual implementation, abstracting the types as high as possible.
In this context, I would answer only return what is meaningful. Does it makes sense at all for the return value to be the concrete class? Aka in your example, ask yourself: will anyone use a LinkedList-specific method on the return value of foo?
If no, just use the higher-level Interface. It's much more flexible, and allows you to change the backend
If yes, ask yourself: can't I refactor my code to return the higher-level interface? :)
The more abstract is your code, the less changes your are required to do when changing a backend. It's as simple as that.
If, on the other hand, you end up casting the return values to the concrete class, well that's a strong sign that you should probably return instead the concrete class. Your users/teammates should not have to know about more or less implicit contracts: if you need to use the concrete methods, just return the concrete class, for clarity.
In a nutshell: code abstract, but explicitly :)
In general, for a public facing interface such as APIs, returning the interface (such as List) over the concrete implementation (such as ArrayList) would be better.
The use of a ArrayList or LinkedList is an implementation detail of the library that should be considered for the most common use case of that library. And of course, internally, having private methods handing off LinkedLists wouldn't necessarily be a bad thing, if it provides facilities that would make the processing easier.
There is no reason that a concrete class shouldn't be used in the implementation, unless there is a good reason to believe that some other List class would be used later on. But then again, changing the implementation details shouldn't be as painful as long as the public facing portion is well-designed.
The library itself should be a black box to its consumers, so they don't really have to worry about what's going on internally. That also means that the library should be designed so that it is designed to be used in the way it is intended.
It doesn't matter all that much whether an API method returns an interface or a concrete class; despite what everyone here says, you almost never change the implementiation class once the code is written.
What's far more important: always use minimum-scope interfaces for your method parameters! That way, clients have maximal freedom and can use classes your code doesn't even know about.
When an API method returns ArrayList, I have absolutely no qualms with that, but when it demands an ArrayList (or, all to common, Vector) parameter, I consider hunting down the programmer and hurting him, because it means that I can't use Arrays.asList(), Collections.singletonList() or Collections.EMPTY_LIST.
As a rule, I only pass back internal implementations if I am in some private, inner workings of a library, and even so only sparingly. For everything that is public and likely to be called from the outside of my module I use interfaces, and also the Factory pattern.
Using interfaces in such a way has proven to be a very reliable way to write reusable code.
The main question has been answered already and you should always use the interface. I however would just like to comment on
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList.
If you are returning a data structure that you know has poor random access performance -- O(n) and typically a LOT of data -- there are other interfaces you should be specifying instead of List, like Iterable so that anyone using the library will be fully aware that only sequential access is available.
Picking the right type to return isn't just about interface versus concrete implementation, it is also about selecting the right interface.
You use interface to abstract away from the actual implementation. The interface is basically just a blueprint for what your implementation can do.
Interfaces are good design because they allow you to change implementation details without having to fear that any of its consumers are directly affected, as long as you implementation still does what your interface says it does.
To work with interfaces you would instantiate them like this:
IParser parser = new Parser();
Now IParser would be your interface, and Parser would be your implementation. Now when you work with the parser object from above, you will work against the interface (IParser), which in turn will work against your implementation (Parser).
That means that you can change the inner workings of Parser as much as you want, it will never affect code that works against your IParser parser interface.
In general use the interface in all cases if you have no need of the functionality of the concrete class. Note that for lists, Java has added a RandomAccess marker class primarily to distinguish a common case where an algorithm may need to know if get(i) is constant time or not.
For uses of code, Michael above is right that being as generic as possible in the method parameters is often even more important. This is especially true when testing such a method.
You'll find (or have found) that as you return interfaces, they permeate through your code. e.g. you return an interface from method A and you have to then pass an interface to method B.
What you're doing is programming by contract, albeit in a limited fashion.
This gives you enormous scope to change implementations under the covers (provided these new objects fulfill the existing contracts/expected behaviours).
Given all of this, you have benefits in terms of choosing your implementation, and how you can substitute behaviours (including testing - using mocking, for example). In case you hadn't guessed, I'm all in favour of this and try to reduce to (or introduce) interfaces wherever possible.
when programming in Java I practically always, just out of habit, write something like this:
public List<String> foo() {
return new ArrayList<String>();
}
Most of the time without even thinking about it. Now, the question is: should I always specify the interface as the return type? Or is it advisable to use the actual implementation of the interface, and if so, under what circumstances?
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList:
List bar = foo();
List myList = bar instanceof LinkedList ? new ArrayList(bar) : bar;
but that just seems horrible and my coworkers would probably lynch me in the cafeteria. And rightfully so.
What do you guys think? What are your guidelines, when do you tend towards the abstract solution, and when do you reveal details of your implementation for potential performance gains?
Return the appropriate interface to hide implementation details. Your clients should only care about what your object offers, not how you implemented it. If you start with a private ArrayList, and decide later on that something else (e.g., LinkedLisk, skip list, etc.) is more appropriate you can change the implementation without affecting clients if you return the interface. The moment you return a concrete type the opportunity is lost.
For instance, if I know that I will
primarily access the data in the list
randomly, a LinkedList would be bad.
But if my library function only
returns the interface, I simply don't
know. To be on the safe side I might
even need to copy the list explicitly
over to an ArrayList.
As everybody else has mentioned, you just mustn't care about how the library has implemented the functionality, to reduce coupling and increasing maintainability of the library.
If you, as a library client, can demonstrate that the implementation is performing badly for your use case, you can then contact the person in charge and discuss about the best path to follow (a new method for this case or just changing the implementation).
That said, your example reeks of premature optimization.
If the method is or can be critical, it might mention the implementation details in the documentation.
Without being able to justify it with reams of CS quotes (I'm self taught), I've always gone by the mantra of "Accept the least derived, return the most derived," when designing classes and it has stood me well over the years.
I guess that means in terms of interface versus concrete return is that if you are trying to reduce dependencies and/or decouple, returning the interface is generally more useful. However, if the concrete class implements more than that interface, it is usually more useful to the callers of your method to get the concrete class back (i.e. the "most derived") rather than aribtrarily restrict them to a subset of that returned object's functionality - unless you actually need to restrict them. Then again, you could also just increase the coverage of the interface. Needless restrictions like this I compare to thoughtless sealing of classes; you never know. Just to talk a bit about the former part of that mantra (for other readers), accepting the least derived also gives maximum flexibility for callers of your method.
-Oisin
Sorry to disagree, but I think the basic rule is as follows:
For input arguments use the most generic.
For output values, the most specific.
So, in this case you want to declare the implementation as:
public ArrayList<String> foo() {
return new ArrayList<String>();
}
Rationale:
The input case is already known and explained by everyone: use the interface, period. However, the output case can look counter-intuitive.
You want to return the implementation because you want the client to have the most information about what is receiving. In this case, more knowledge is more power.
Example 1: the client wants to get the 5th element:
return Collection: must iterate until 5th element vs return List:
return List: list.get(4)
Example 2: the client wants to remove the 5th element:
return List: must create a new list without the specified element (list.remove() is optional).
return ArrayList: arrayList.remove(4)
So it's a big truth that using interfaces is great because it promotes reusability, reduces coupling, improves maintainability and makes people happy ... but only when used as input.
So, again, the rule can be stated as:
Be flexible for what you offer.
Be informative with what you deliver.
So, next time, please return the implementation.
In OO programming, we want to encapsulate as much as possible the data. Hide as much as possible the actual implementation, abstracting the types as high as possible.
In this context, I would answer only return what is meaningful. Does it makes sense at all for the return value to be the concrete class? Aka in your example, ask yourself: will anyone use a LinkedList-specific method on the return value of foo?
If no, just use the higher-level Interface. It's much more flexible, and allows you to change the backend
If yes, ask yourself: can't I refactor my code to return the higher-level interface? :)
The more abstract is your code, the less changes your are required to do when changing a backend. It's as simple as that.
If, on the other hand, you end up casting the return values to the concrete class, well that's a strong sign that you should probably return instead the concrete class. Your users/teammates should not have to know about more or less implicit contracts: if you need to use the concrete methods, just return the concrete class, for clarity.
In a nutshell: code abstract, but explicitly :)
In general, for a public facing interface such as APIs, returning the interface (such as List) over the concrete implementation (such as ArrayList) would be better.
The use of a ArrayList or LinkedList is an implementation detail of the library that should be considered for the most common use case of that library. And of course, internally, having private methods handing off LinkedLists wouldn't necessarily be a bad thing, if it provides facilities that would make the processing easier.
There is no reason that a concrete class shouldn't be used in the implementation, unless there is a good reason to believe that some other List class would be used later on. But then again, changing the implementation details shouldn't be as painful as long as the public facing portion is well-designed.
The library itself should be a black box to its consumers, so they don't really have to worry about what's going on internally. That also means that the library should be designed so that it is designed to be used in the way it is intended.
It doesn't matter all that much whether an API method returns an interface or a concrete class; despite what everyone here says, you almost never change the implementiation class once the code is written.
What's far more important: always use minimum-scope interfaces for your method parameters! That way, clients have maximal freedom and can use classes your code doesn't even know about.
When an API method returns ArrayList, I have absolutely no qualms with that, but when it demands an ArrayList (or, all to common, Vector) parameter, I consider hunting down the programmer and hurting him, because it means that I can't use Arrays.asList(), Collections.singletonList() or Collections.EMPTY_LIST.
As a rule, I only pass back internal implementations if I am in some private, inner workings of a library, and even so only sparingly. For everything that is public and likely to be called from the outside of my module I use interfaces, and also the Factory pattern.
Using interfaces in such a way has proven to be a very reliable way to write reusable code.
The main question has been answered already and you should always use the interface. I however would just like to comment on
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList.
If you are returning a data structure that you know has poor random access performance -- O(n) and typically a LOT of data -- there are other interfaces you should be specifying instead of List, like Iterable so that anyone using the library will be fully aware that only sequential access is available.
Picking the right type to return isn't just about interface versus concrete implementation, it is also about selecting the right interface.
You use interface to abstract away from the actual implementation. The interface is basically just a blueprint for what your implementation can do.
Interfaces are good design because they allow you to change implementation details without having to fear that any of its consumers are directly affected, as long as you implementation still does what your interface says it does.
To work with interfaces you would instantiate them like this:
IParser parser = new Parser();
Now IParser would be your interface, and Parser would be your implementation. Now when you work with the parser object from above, you will work against the interface (IParser), which in turn will work against your implementation (Parser).
That means that you can change the inner workings of Parser as much as you want, it will never affect code that works against your IParser parser interface.
In general use the interface in all cases if you have no need of the functionality of the concrete class. Note that for lists, Java has added a RandomAccess marker class primarily to distinguish a common case where an algorithm may need to know if get(i) is constant time or not.
For uses of code, Michael above is right that being as generic as possible in the method parameters is often even more important. This is especially true when testing such a method.
You'll find (or have found) that as you return interfaces, they permeate through your code. e.g. you return an interface from method A and you have to then pass an interface to method B.
What you're doing is programming by contract, albeit in a limited fashion.
This gives you enormous scope to change implementations under the covers (provided these new objects fulfill the existing contracts/expected behaviours).
Given all of this, you have benefits in terms of choosing your implementation, and how you can substitute behaviours (including testing - using mocking, for example). In case you hadn't guessed, I'm all in favour of this and try to reduce to (or introduce) interfaces wherever possible.
This is a question of curiosity about accepted coding practices. I'm (primarily) a Java developer, and have been increasingly making efforts to unit test my code. I've spent some time looking at how to write the most testable code, paying particular attention to Google's How to write untestable code guide (well worth a look, if you haven't seen it).
Naturally, I was arguing recently with a more C++-oriented friend about the advantages of each language's inheritance model, and I thought I'd pull out a trump card by saying how much harder C++ programmers made it to test their code by constantly forgetting the virtual keyword (for C++ers - this is the default in Java; you get rid of it using final).
I posted a code example that I thought would demonstrate the advantages of Java's model quite well (the full thing is over on GitHub). The short version:
class MyClassForTesting {
private final Database mDatabase;
private final Api mApi;
void myFunctionForTesting() {
for (User u : mDatabase.getUsers()) {
mRemoteApi.updateUserData(u);
}
}
MyClassForTesting ( Database usersDatabase, Api remoteApi) {
mDatabase = userDatabase;
mRemoteApi = remoteApi;
}
}
Regardless of the quality of what I've written here, the idea is that the class needs to make some (potentially quite expensive) calls to a database, and some API (maybe on a remote web server). myFunctionForTesting() doesn't have a return type, so how do you unit test this? In Java, I think the answer isn't too difficult - we mock:
/*** Tests ***/
/*
* This will record some stuff and we'll check it later to see that
* the things we expect really happened.
*/
ActionRecorder ar = new ActionRecorder();
/** Mock up some classes **/
Database mockedDatabase = new Database(ar) {
#Override
public Set<User> getUsers() {
ar.recordAction("got list of users");
/* Excuse my abuse of notation */
return new Set<User>( {new User("Jim"), new User("Kyle")} );
}
Database(ActionRecorder ar) {
this.ar = ar;
}
}
Api mockApi = new Api() {
#Override
public void updateUserData(User u) {
ar.recordAction("Updated user data for " + u.name());
}
Api(ActionRecorder ar) {
this.ar = ar;
}
}
/** Carry out the tests with the mocked up classes **/
MyClassForTesting testObj = new MyClassForTesting(mockDatabase, mockApi);
testObj.myFunctionForTesting();
// Check that it really fetches users from the database
assert ar.contains("got list of users");
// Check that it is checking the users we passed it
assert ar.contains("Updated user data for Jim");
assert ar.contains("Updated user data for Kyle");
By mocking up these classes, we inject the dependencies with our own light-weight versions that we can make assertions on for unit testing, and avoid making expensive, time-consuming calls to database/api-land. The designers of Database and Api don't have to be too aware that this is what we're going to do, and the designer of MyClassForTesting certainly doesn't have to know! This seems (to me) like a pretty good way to do things.
My C++ friend, however, retorted that this was a dreadful hack, and there's a good reason C++ won't let you do this! He then presented a solution based on Generics, which does much the same thing. For brevity's sake, I'll just list a part of the solution he gave, but again you can find the whole thing over on Github.
template<typename A, typename D>
class MyClassForTesting {
private:
A mApi;
D mDatabase;
public MyClassForTesting(D database, A api) {
mApi = api;
mDatabase = database;
}
...
};
Which would then be tested much like before, but with the important bits that get replaced shown below:
class MockDatabase : Database {
...
}
class MockApi : Api {
...
}
MyClassForTesting<MockApi, MockDatabase>
testingObj(MockApi(ar), MockDatabase(ar));
So my question is this: What's the preferred method? I always thought the polymorphism-based approach was better - and I see no reason it wouldn't be in Java - but is it normally considered better to use Generics than Virtualise everything in C++? What do you do in your code (assuming you do unit test) ?
I'm probably biased, but I'd say the C++ version is better. Among other things, polymorphism carries some cost. In this case, you're making your users pay that cost, even though they receive no direct benefit from it.
If, for example, you had a list of polymorphic objects, and want to manipulate all of them via the base class, that would justify using polymorphism. In this case, however, the polymorphism is being used for something the user never even sees. You've built in the ability to manipulate polymorphic objects, but never really used it -- for testing you'll only have mock objects, and for real use you'll only have real objects. There will never be a time that you have (for example) an array of database objects, some of which are mock databases and others of which are real databases.
This is also much more than just an efficiency issue (or at least a run-time efficiency issue). The relationships in your code should be meaningful. When somebody sees (public) inheritance, that should tell them something about the design. As you've outlined it in Java, however, the public inheritance relationship involved is basically a lie -- i.e. what he should know from it (that you're dealing with polymorphic descendants) is an outright falsehood. The C++ code, by contrast, correctly conveys the intent to the reader.
To an extent, I'm overstating the case there, of course. People who normally read Java are almost certainly well accustomed to the way inheritance is typically abused, so they don't see this as a lie at all. This is a bit of throwing out the baby with the bathwater though -- instead of seeing the "lie" for what it is, they've learned to completely ignore what inheritance really means (or just never knew, especially if they went to college where Java was the primary vehicle for teaching OOP). As I said, I'm probably somewhat biased, but to to me this makes (most) Java code much more difficult to understand. You basically have to be careful to ignore the basic principles of OOP, and get accustomed to its constant abuse.
Some key advice is "prefer composition to inheritence", which is what your MyClassForTesting has done with respect to the Database and Api. This is good C++ advice too: IIRC it is in Effective C++.
It is a bit rich for your friend to claim that using polymorphism is a "dreadful hack" but using templates is not. On what basis does (s)he claim that one is less hacky than the other? I see none, and I use both all the time in my C++ code.
I'd say the polymorphism approach (as you have done) is better. Consider that Database and Api might be interfaces. In that case you are explicitly declaring the API used by MyClassForTesting: someone can read the Api.java and Database.java files. And you are loosely coupling the modules: the Api and Database interfaces will naturally be the narrowest acceptable interfaces, much narrower than the public interface of any concerete class that implements them.
More importantly, you cannot create templated virtual functions. This makes it impossible to test functions in C++ which use templates, by using inheritance, and therefore testing by inheritance in C++ is unreliable as you cannot test all classes that way, and definitely not every use of a base class can be substituted with that of a derived class, especially w.r.t instantiating templates of them. Of course, templates introduce their own problems, but I think that's beyond the scope of the question.
You're throwing inheritance at the problem but really it's not the right solution- you only need to change between the mock and the real at compile time, not at run time. This fundamental fact makes templates the better option.
In C++, we don't forget the virtual keyword, we just don't need it, because run-time polymorphism should only occur when you need to vary the type at run-time. Else, you're firing a rocket launcher at a nail.
when programming in Java I practically always, just out of habit, write something like this:
public List<String> foo() {
return new ArrayList<String>();
}
Most of the time without even thinking about it. Now, the question is: should I always specify the interface as the return type? Or is it advisable to use the actual implementation of the interface, and if so, under what circumstances?
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList:
List bar = foo();
List myList = bar instanceof LinkedList ? new ArrayList(bar) : bar;
but that just seems horrible and my coworkers would probably lynch me in the cafeteria. And rightfully so.
What do you guys think? What are your guidelines, when do you tend towards the abstract solution, and when do you reveal details of your implementation for potential performance gains?
Return the appropriate interface to hide implementation details. Your clients should only care about what your object offers, not how you implemented it. If you start with a private ArrayList, and decide later on that something else (e.g., LinkedLisk, skip list, etc.) is more appropriate you can change the implementation without affecting clients if you return the interface. The moment you return a concrete type the opportunity is lost.
For instance, if I know that I will
primarily access the data in the list
randomly, a LinkedList would be bad.
But if my library function only
returns the interface, I simply don't
know. To be on the safe side I might
even need to copy the list explicitly
over to an ArrayList.
As everybody else has mentioned, you just mustn't care about how the library has implemented the functionality, to reduce coupling and increasing maintainability of the library.
If you, as a library client, can demonstrate that the implementation is performing badly for your use case, you can then contact the person in charge and discuss about the best path to follow (a new method for this case or just changing the implementation).
That said, your example reeks of premature optimization.
If the method is or can be critical, it might mention the implementation details in the documentation.
Without being able to justify it with reams of CS quotes (I'm self taught), I've always gone by the mantra of "Accept the least derived, return the most derived," when designing classes and it has stood me well over the years.
I guess that means in terms of interface versus concrete return is that if you are trying to reduce dependencies and/or decouple, returning the interface is generally more useful. However, if the concrete class implements more than that interface, it is usually more useful to the callers of your method to get the concrete class back (i.e. the "most derived") rather than aribtrarily restrict them to a subset of that returned object's functionality - unless you actually need to restrict them. Then again, you could also just increase the coverage of the interface. Needless restrictions like this I compare to thoughtless sealing of classes; you never know. Just to talk a bit about the former part of that mantra (for other readers), accepting the least derived also gives maximum flexibility for callers of your method.
-Oisin
Sorry to disagree, but I think the basic rule is as follows:
For input arguments use the most generic.
For output values, the most specific.
So, in this case you want to declare the implementation as:
public ArrayList<String> foo() {
return new ArrayList<String>();
}
Rationale:
The input case is already known and explained by everyone: use the interface, period. However, the output case can look counter-intuitive.
You want to return the implementation because you want the client to have the most information about what is receiving. In this case, more knowledge is more power.
Example 1: the client wants to get the 5th element:
return Collection: must iterate until 5th element vs return List:
return List: list.get(4)
Example 2: the client wants to remove the 5th element:
return List: must create a new list without the specified element (list.remove() is optional).
return ArrayList: arrayList.remove(4)
So it's a big truth that using interfaces is great because it promotes reusability, reduces coupling, improves maintainability and makes people happy ... but only when used as input.
So, again, the rule can be stated as:
Be flexible for what you offer.
Be informative with what you deliver.
So, next time, please return the implementation.
In OO programming, we want to encapsulate as much as possible the data. Hide as much as possible the actual implementation, abstracting the types as high as possible.
In this context, I would answer only return what is meaningful. Does it makes sense at all for the return value to be the concrete class? Aka in your example, ask yourself: will anyone use a LinkedList-specific method on the return value of foo?
If no, just use the higher-level Interface. It's much more flexible, and allows you to change the backend
If yes, ask yourself: can't I refactor my code to return the higher-level interface? :)
The more abstract is your code, the less changes your are required to do when changing a backend. It's as simple as that.
If, on the other hand, you end up casting the return values to the concrete class, well that's a strong sign that you should probably return instead the concrete class. Your users/teammates should not have to know about more or less implicit contracts: if you need to use the concrete methods, just return the concrete class, for clarity.
In a nutshell: code abstract, but explicitly :)
In general, for a public facing interface such as APIs, returning the interface (such as List) over the concrete implementation (such as ArrayList) would be better.
The use of a ArrayList or LinkedList is an implementation detail of the library that should be considered for the most common use case of that library. And of course, internally, having private methods handing off LinkedLists wouldn't necessarily be a bad thing, if it provides facilities that would make the processing easier.
There is no reason that a concrete class shouldn't be used in the implementation, unless there is a good reason to believe that some other List class would be used later on. But then again, changing the implementation details shouldn't be as painful as long as the public facing portion is well-designed.
The library itself should be a black box to its consumers, so they don't really have to worry about what's going on internally. That also means that the library should be designed so that it is designed to be used in the way it is intended.
It doesn't matter all that much whether an API method returns an interface or a concrete class; despite what everyone here says, you almost never change the implementiation class once the code is written.
What's far more important: always use minimum-scope interfaces for your method parameters! That way, clients have maximal freedom and can use classes your code doesn't even know about.
When an API method returns ArrayList, I have absolutely no qualms with that, but when it demands an ArrayList (or, all to common, Vector) parameter, I consider hunting down the programmer and hurting him, because it means that I can't use Arrays.asList(), Collections.singletonList() or Collections.EMPTY_LIST.
As a rule, I only pass back internal implementations if I am in some private, inner workings of a library, and even so only sparingly. For everything that is public and likely to be called from the outside of my module I use interfaces, and also the Factory pattern.
Using interfaces in such a way has proven to be a very reliable way to write reusable code.
The main question has been answered already and you should always use the interface. I however would just like to comment on
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList.
If you are returning a data structure that you know has poor random access performance -- O(n) and typically a LOT of data -- there are other interfaces you should be specifying instead of List, like Iterable so that anyone using the library will be fully aware that only sequential access is available.
Picking the right type to return isn't just about interface versus concrete implementation, it is also about selecting the right interface.
You use interface to abstract away from the actual implementation. The interface is basically just a blueprint for what your implementation can do.
Interfaces are good design because they allow you to change implementation details without having to fear that any of its consumers are directly affected, as long as you implementation still does what your interface says it does.
To work with interfaces you would instantiate them like this:
IParser parser = new Parser();
Now IParser would be your interface, and Parser would be your implementation. Now when you work with the parser object from above, you will work against the interface (IParser), which in turn will work against your implementation (Parser).
That means that you can change the inner workings of Parser as much as you want, it will never affect code that works against your IParser parser interface.
In general use the interface in all cases if you have no need of the functionality of the concrete class. Note that for lists, Java has added a RandomAccess marker class primarily to distinguish a common case where an algorithm may need to know if get(i) is constant time or not.
For uses of code, Michael above is right that being as generic as possible in the method parameters is often even more important. This is especially true when testing such a method.
You'll find (or have found) that as you return interfaces, they permeate through your code. e.g. you return an interface from method A and you have to then pass an interface to method B.
What you're doing is programming by contract, albeit in a limited fashion.
This gives you enormous scope to change implementations under the covers (provided these new objects fulfill the existing contracts/expected behaviours).
Given all of this, you have benefits in terms of choosing your implementation, and how you can substitute behaviours (including testing - using mocking, for example). In case you hadn't guessed, I'm all in favour of this and try to reduce to (or introduce) interfaces wherever possible.
I'm rather new to Java. After just reading some info on path finding, I read about using an empty class as an "interface", for an unknown object type.
I'm developing a game in Java based on hospital theme. So far, the user can build a reception desk and a GP's office. They are two different types of object, one is a Building and one is a ReceptionDesk. (In my class structure.)
My class structure is this:
GridObject-->Building
GridObject-->Item-->usableItem-->ReceptionDesk.
The problem comes when the usable item can be rotated and the building cannot. The mouse click event is on the grid, so calls the same method. The GP's office is a Building and the reception desk is a ReceptionDesk. Only the ReceptionDesk has the method rotate. When right clicking on the grid, if in building mode, I have to use this "if" statement:
if (currentBuilding.getClass.equals(ReceptionDesk.getClass)
I then have to create a new ReceptionDesk, use the rotate method, and the put that
reception desk back into the currentBuilding GridObject.
I'm not sure if I'm explaining myself very well with this question. Sorry. I am still quite new to Java. I will try to answer any questions and I can post more code snippits if need be. I didn't know that there might be a way around the issue of not knowing the class of the object, however I may also be going about it the wrong way.
I hadn't planned on looking into this until I saw how fast and helpful the replies on this site were! :)
Thanks in advance.
Rel
You don't want to check the class of an object before doing something with it in your case. You should be using polymorphism. You want to have the Interface define some methods. Each class implement those methods. Refer to the object by its interface, and have the individual implementations of those objects return their values to the caller.
If you describe a few more of the objects you think you need, people here will have opinions on how you should lay them out. But from what you've provided, you may want a "Building" interface that defines some general methods. You may also want a "UsableItem" interface or something more generic. Hospital could be a class that implements building. ReceptionDesk could implement UsableItem. Building could have a grid of UsableItem inside it.
If rotate() was a common method to all furniture that actually did some work, you may consider making an AbstractUsableItem class that was an abstract class implementing UsableItemand providing the rotate() method. If rotate was different in each implementing class, you would have that method in the interface, but each class, like ReceptionDesk would do its own thing with the rotate() method. Your code would do something like:
UsableItem desk = new ReceptionDesk();
desk.rotate()
In your example, if your mouse click on a screen rotated the object under it, and you really did need to check to see if the object could be rotated before doing something like that, you'd do
if (clickedObject instanceOf UsableItem) {
((UsableItem) clickedObject).rotate();
}
where UsableItem was the interface or abstract class. Some people feel that all design should be done via an interface contract and suggest an interface for every type of class, but I don't know if you have to go that far.
You might consider moving in a totally different direction and having the objects themselves decide what kind of action to take. For example, the GridObject interface might specify function declarations for handleRightClick(), handleLeftClick(), etc. What you'd be saying in that case is "any class who calls themselves a GridObject needs to specify what happens when they are right-clicked".
So, within the Building class, you might implement handleRightClick to do nothing (or to return an error). Within the ReceptionDesk class, you would implement handleRightClick to rotate the desk.
Your code snippet would then become:
currentBuilding.handleRightClick(... any necessary parameters ...);
You are correct to be worried. A good rule of thumb for Object-oriented design is that whenever you use a construct like if(x instanceof Y) or if(x.getClass().equals(Y.class)), you should start thinking about moving methods up or down, or extracting new methods.
Elliot and John have both presented good ideas on very different directions you could go, but they're both right in that you should definitely move in some direction. Object oriented design is there to help your code become more legible by making branching for different sorts of behaviors more implicit. Examining what sort of object you're looking at and determining what to do based on that can defeat the purpose of using object-oriented design.
I should also warn you that an interface isn't exactly an empty class. There are some significant differences between empty, abstract classes with abstract methods and interfaces. Instead of thinking of an interface as an empty class, think of an interface as a contract. By implementing an interface, your class promises to provide each of the methods listed in the interface.