As I've always understood it, the main cases where an instanceof is appropriate are:
Implementing Object.equals(Object). So if I were writing a List class, and not extending AbstractList for whatever reason, I would implement equals(o) by first testing o instanceof List, and then comparing elements.
A significant (algorithmic?) optimization for a special case that does not change semantics, but only performance. For example, Collections.binarySearch does an instanceof RandomAccess test, and uses a slightly different binary search for RandomAccess and non-RandomAccess lists.
I don't think instanceof represents a code smell in these two cases. But are there any other cases where it is sensible to use instanceof?
One way to answer your question would be to answer "when does a solid Java library use instanceof?" If we assume Guava is an example of a well designed Java library, we can look at where it uses instanceof to decide when it is acceptable to do.
If we extract the Guava source code jar and grep it, we see instanceof is mentioned 439 times, across 122 files:
$ pwd
/tmp/guava-13.0.1-sources
$ grep -R 'instanceof' | wc -l
439
$ grep -Rl 'instanceof' | wc -l
122
And looking at some of these cases we can see several patterns emerge:
To check for equality
This is the most common usage. This is somewhat implementation specific, but assuming you do in fact want to measure equality based on the class/interface it extends/implements, you can use instanceof to ensure the object your working with is . However this can potentially cause odd problems if a child class overrides equals() and doesn't respect the same instanceof requirements as the parent. Examples everywhere, but an easy one is Lists.equalsImpl() which is used by ImmutableList.
To short-circuit unnecessary object construction
You can use instanceof to check if the passed in argument can be safely used or returned without further transforming it, for instance if it's already an instance of the desired class, or if we know it's immutable. See examples in CharMatcher.forPredicate(), Suppliers.memoize(), ImmutableList.copyOf(), etc.
To access implementation-details without exposing different behavior
This can be seen all over the place in Guava, but notably in the static utility classes in the com.google.common.collect package, for instance in Iterables.size() where it calls Collection.size() if possible, and otherwise counts the number of items in the iterator in O(n) time.
To avoid calling toString()
I'm skeptical this merits being done in more than a very select few cases, but assuming you're sure you're doing the right thing, you can avoid needlessly converting CharSequence objects into Strings with instanceof, like is done in Joiner.toString(Object).
To do complex Exception handling
Obviously the "right" thing to do is use a try/catch block (though really, that's doing instanceof checks already), but sometimes you have more complex handling logic that merits using conditional blocks or passing processing off to a separate method, for instance pulling out causes or having implementation-specific processing. An example can be found in SimpleTimeLimiter.throwCause().
One thing that stands out looking at this behavior is nearly all of them are addressing problems I should not be solving. They're useful in library code, e.g. in Iterables, but if I'm implementing this behavior, I should probably be asking myself if there aren't libraries or utilities that solve this for me.
In all cases, I would say that instanceof checks should only ever be used internally as an implementation detail - that is to say the caller of any method that relies on an instanceof check should not be able to (easily) tell that's what you did. For instance, ImmutableList.copyOf() always returns an ImmutableList. As an implementation detail it uses instanceof to avoid constructing new ImmutableLists, but that is not necessary to acceptably provide the expected behavior.
As an aside, it was amusing coming across your name, Louis, as I was digging through Guava's source code. I swear I had no idea!
Legacy code or APIs outside of your control are a legitimate use-case for instanceof. (Even then I'd rather write an OO layer over it, but timing sometimes precludes a redesign like that.)
In particular, factories based on external class hierarchies seem a common usage.
Your first case is an example where I would not use the instanceof operator, but see whether the classes are equal:
o != null && o.getClass() == this.getClass()
This will avoid that an instance of A extends B and B are considered equal
Other cases I can immediately think of but I am pretty sure more valid cases are available
factory instances where you have for example a canCreate and create method which receive a general interface as parameter. Each of the factories can handle a specific implementation of the interface, so it would require an instanceof. Defining only the interface in the factory abstract class/interface allows for example to write a composite factory
composite implementations (as illustrated in my first example)
As you have mentioned, the "correct" uses of instanceof are rather limited. As far as I know, you have basically summed up the two main uses.
However you can generalize your statements a bit though as follows:
Type-checking before necessary casts.
Implementing special case scenarios that depend on very particular class instances
Related
when programming in Java I practically always, just out of habit, write something like this:
public List<String> foo() {
return new ArrayList<String>();
}
Most of the time without even thinking about it. Now, the question is: should I always specify the interface as the return type? Or is it advisable to use the actual implementation of the interface, and if so, under what circumstances?
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList:
List bar = foo();
List myList = bar instanceof LinkedList ? new ArrayList(bar) : bar;
but that just seems horrible and my coworkers would probably lynch me in the cafeteria. And rightfully so.
What do you guys think? What are your guidelines, when do you tend towards the abstract solution, and when do you reveal details of your implementation for potential performance gains?
Return the appropriate interface to hide implementation details. Your clients should only care about what your object offers, not how you implemented it. If you start with a private ArrayList, and decide later on that something else (e.g., LinkedLisk, skip list, etc.) is more appropriate you can change the implementation without affecting clients if you return the interface. The moment you return a concrete type the opportunity is lost.
For instance, if I know that I will
primarily access the data in the list
randomly, a LinkedList would be bad.
But if my library function only
returns the interface, I simply don't
know. To be on the safe side I might
even need to copy the list explicitly
over to an ArrayList.
As everybody else has mentioned, you just mustn't care about how the library has implemented the functionality, to reduce coupling and increasing maintainability of the library.
If you, as a library client, can demonstrate that the implementation is performing badly for your use case, you can then contact the person in charge and discuss about the best path to follow (a new method for this case or just changing the implementation).
That said, your example reeks of premature optimization.
If the method is or can be critical, it might mention the implementation details in the documentation.
Without being able to justify it with reams of CS quotes (I'm self taught), I've always gone by the mantra of "Accept the least derived, return the most derived," when designing classes and it has stood me well over the years.
I guess that means in terms of interface versus concrete return is that if you are trying to reduce dependencies and/or decouple, returning the interface is generally more useful. However, if the concrete class implements more than that interface, it is usually more useful to the callers of your method to get the concrete class back (i.e. the "most derived") rather than aribtrarily restrict them to a subset of that returned object's functionality - unless you actually need to restrict them. Then again, you could also just increase the coverage of the interface. Needless restrictions like this I compare to thoughtless sealing of classes; you never know. Just to talk a bit about the former part of that mantra (for other readers), accepting the least derived also gives maximum flexibility for callers of your method.
-Oisin
Sorry to disagree, but I think the basic rule is as follows:
For input arguments use the most generic.
For output values, the most specific.
So, in this case you want to declare the implementation as:
public ArrayList<String> foo() {
return new ArrayList<String>();
}
Rationale:
The input case is already known and explained by everyone: use the interface, period. However, the output case can look counter-intuitive.
You want to return the implementation because you want the client to have the most information about what is receiving. In this case, more knowledge is more power.
Example 1: the client wants to get the 5th element:
return Collection: must iterate until 5th element vs return List:
return List: list.get(4)
Example 2: the client wants to remove the 5th element:
return List: must create a new list without the specified element (list.remove() is optional).
return ArrayList: arrayList.remove(4)
So it's a big truth that using interfaces is great because it promotes reusability, reduces coupling, improves maintainability and makes people happy ... but only when used as input.
So, again, the rule can be stated as:
Be flexible for what you offer.
Be informative with what you deliver.
So, next time, please return the implementation.
In OO programming, we want to encapsulate as much as possible the data. Hide as much as possible the actual implementation, abstracting the types as high as possible.
In this context, I would answer only return what is meaningful. Does it makes sense at all for the return value to be the concrete class? Aka in your example, ask yourself: will anyone use a LinkedList-specific method on the return value of foo?
If no, just use the higher-level Interface. It's much more flexible, and allows you to change the backend
If yes, ask yourself: can't I refactor my code to return the higher-level interface? :)
The more abstract is your code, the less changes your are required to do when changing a backend. It's as simple as that.
If, on the other hand, you end up casting the return values to the concrete class, well that's a strong sign that you should probably return instead the concrete class. Your users/teammates should not have to know about more or less implicit contracts: if you need to use the concrete methods, just return the concrete class, for clarity.
In a nutshell: code abstract, but explicitly :)
In general, for a public facing interface such as APIs, returning the interface (such as List) over the concrete implementation (such as ArrayList) would be better.
The use of a ArrayList or LinkedList is an implementation detail of the library that should be considered for the most common use case of that library. And of course, internally, having private methods handing off LinkedLists wouldn't necessarily be a bad thing, if it provides facilities that would make the processing easier.
There is no reason that a concrete class shouldn't be used in the implementation, unless there is a good reason to believe that some other List class would be used later on. But then again, changing the implementation details shouldn't be as painful as long as the public facing portion is well-designed.
The library itself should be a black box to its consumers, so they don't really have to worry about what's going on internally. That also means that the library should be designed so that it is designed to be used in the way it is intended.
It doesn't matter all that much whether an API method returns an interface or a concrete class; despite what everyone here says, you almost never change the implementiation class once the code is written.
What's far more important: always use minimum-scope interfaces for your method parameters! That way, clients have maximal freedom and can use classes your code doesn't even know about.
When an API method returns ArrayList, I have absolutely no qualms with that, but when it demands an ArrayList (or, all to common, Vector) parameter, I consider hunting down the programmer and hurting him, because it means that I can't use Arrays.asList(), Collections.singletonList() or Collections.EMPTY_LIST.
As a rule, I only pass back internal implementations if I am in some private, inner workings of a library, and even so only sparingly. For everything that is public and likely to be called from the outside of my module I use interfaces, and also the Factory pattern.
Using interfaces in such a way has proven to be a very reliable way to write reusable code.
The main question has been answered already and you should always use the interface. I however would just like to comment on
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList.
If you are returning a data structure that you know has poor random access performance -- O(n) and typically a LOT of data -- there are other interfaces you should be specifying instead of List, like Iterable so that anyone using the library will be fully aware that only sequential access is available.
Picking the right type to return isn't just about interface versus concrete implementation, it is also about selecting the right interface.
You use interface to abstract away from the actual implementation. The interface is basically just a blueprint for what your implementation can do.
Interfaces are good design because they allow you to change implementation details without having to fear that any of its consumers are directly affected, as long as you implementation still does what your interface says it does.
To work with interfaces you would instantiate them like this:
IParser parser = new Parser();
Now IParser would be your interface, and Parser would be your implementation. Now when you work with the parser object from above, you will work against the interface (IParser), which in turn will work against your implementation (Parser).
That means that you can change the inner workings of Parser as much as you want, it will never affect code that works against your IParser parser interface.
In general use the interface in all cases if you have no need of the functionality of the concrete class. Note that for lists, Java has added a RandomAccess marker class primarily to distinguish a common case where an algorithm may need to know if get(i) is constant time or not.
For uses of code, Michael above is right that being as generic as possible in the method parameters is often even more important. This is especially true when testing such a method.
You'll find (or have found) that as you return interfaces, they permeate through your code. e.g. you return an interface from method A and you have to then pass an interface to method B.
What you're doing is programming by contract, albeit in a limited fashion.
This gives you enormous scope to change implementations under the covers (provided these new objects fulfill the existing contracts/expected behaviours).
Given all of this, you have benefits in terms of choosing your implementation, and how you can substitute behaviours (including testing - using mocking, for example). In case you hadn't guessed, I'm all in favour of this and try to reduce to (or introduce) interfaces wherever possible.
when programming in Java I practically always, just out of habit, write something like this:
public List<String> foo() {
return new ArrayList<String>();
}
Most of the time without even thinking about it. Now, the question is: should I always specify the interface as the return type? Or is it advisable to use the actual implementation of the interface, and if so, under what circumstances?
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList:
List bar = foo();
List myList = bar instanceof LinkedList ? new ArrayList(bar) : bar;
but that just seems horrible and my coworkers would probably lynch me in the cafeteria. And rightfully so.
What do you guys think? What are your guidelines, when do you tend towards the abstract solution, and when do you reveal details of your implementation for potential performance gains?
Return the appropriate interface to hide implementation details. Your clients should only care about what your object offers, not how you implemented it. If you start with a private ArrayList, and decide later on that something else (e.g., LinkedLisk, skip list, etc.) is more appropriate you can change the implementation without affecting clients if you return the interface. The moment you return a concrete type the opportunity is lost.
For instance, if I know that I will
primarily access the data in the list
randomly, a LinkedList would be bad.
But if my library function only
returns the interface, I simply don't
know. To be on the safe side I might
even need to copy the list explicitly
over to an ArrayList.
As everybody else has mentioned, you just mustn't care about how the library has implemented the functionality, to reduce coupling and increasing maintainability of the library.
If you, as a library client, can demonstrate that the implementation is performing badly for your use case, you can then contact the person in charge and discuss about the best path to follow (a new method for this case or just changing the implementation).
That said, your example reeks of premature optimization.
If the method is or can be critical, it might mention the implementation details in the documentation.
Without being able to justify it with reams of CS quotes (I'm self taught), I've always gone by the mantra of "Accept the least derived, return the most derived," when designing classes and it has stood me well over the years.
I guess that means in terms of interface versus concrete return is that if you are trying to reduce dependencies and/or decouple, returning the interface is generally more useful. However, if the concrete class implements more than that interface, it is usually more useful to the callers of your method to get the concrete class back (i.e. the "most derived") rather than aribtrarily restrict them to a subset of that returned object's functionality - unless you actually need to restrict them. Then again, you could also just increase the coverage of the interface. Needless restrictions like this I compare to thoughtless sealing of classes; you never know. Just to talk a bit about the former part of that mantra (for other readers), accepting the least derived also gives maximum flexibility for callers of your method.
-Oisin
Sorry to disagree, but I think the basic rule is as follows:
For input arguments use the most generic.
For output values, the most specific.
So, in this case you want to declare the implementation as:
public ArrayList<String> foo() {
return new ArrayList<String>();
}
Rationale:
The input case is already known and explained by everyone: use the interface, period. However, the output case can look counter-intuitive.
You want to return the implementation because you want the client to have the most information about what is receiving. In this case, more knowledge is more power.
Example 1: the client wants to get the 5th element:
return Collection: must iterate until 5th element vs return List:
return List: list.get(4)
Example 2: the client wants to remove the 5th element:
return List: must create a new list without the specified element (list.remove() is optional).
return ArrayList: arrayList.remove(4)
So it's a big truth that using interfaces is great because it promotes reusability, reduces coupling, improves maintainability and makes people happy ... but only when used as input.
So, again, the rule can be stated as:
Be flexible for what you offer.
Be informative with what you deliver.
So, next time, please return the implementation.
In OO programming, we want to encapsulate as much as possible the data. Hide as much as possible the actual implementation, abstracting the types as high as possible.
In this context, I would answer only return what is meaningful. Does it makes sense at all for the return value to be the concrete class? Aka in your example, ask yourself: will anyone use a LinkedList-specific method on the return value of foo?
If no, just use the higher-level Interface. It's much more flexible, and allows you to change the backend
If yes, ask yourself: can't I refactor my code to return the higher-level interface? :)
The more abstract is your code, the less changes your are required to do when changing a backend. It's as simple as that.
If, on the other hand, you end up casting the return values to the concrete class, well that's a strong sign that you should probably return instead the concrete class. Your users/teammates should not have to know about more or less implicit contracts: if you need to use the concrete methods, just return the concrete class, for clarity.
In a nutshell: code abstract, but explicitly :)
In general, for a public facing interface such as APIs, returning the interface (such as List) over the concrete implementation (such as ArrayList) would be better.
The use of a ArrayList or LinkedList is an implementation detail of the library that should be considered for the most common use case of that library. And of course, internally, having private methods handing off LinkedLists wouldn't necessarily be a bad thing, if it provides facilities that would make the processing easier.
There is no reason that a concrete class shouldn't be used in the implementation, unless there is a good reason to believe that some other List class would be used later on. But then again, changing the implementation details shouldn't be as painful as long as the public facing portion is well-designed.
The library itself should be a black box to its consumers, so they don't really have to worry about what's going on internally. That also means that the library should be designed so that it is designed to be used in the way it is intended.
It doesn't matter all that much whether an API method returns an interface or a concrete class; despite what everyone here says, you almost never change the implementiation class once the code is written.
What's far more important: always use minimum-scope interfaces for your method parameters! That way, clients have maximal freedom and can use classes your code doesn't even know about.
When an API method returns ArrayList, I have absolutely no qualms with that, but when it demands an ArrayList (or, all to common, Vector) parameter, I consider hunting down the programmer and hurting him, because it means that I can't use Arrays.asList(), Collections.singletonList() or Collections.EMPTY_LIST.
As a rule, I only pass back internal implementations if I am in some private, inner workings of a library, and even so only sparingly. For everything that is public and likely to be called from the outside of my module I use interfaces, and also the Factory pattern.
Using interfaces in such a way has proven to be a very reliable way to write reusable code.
The main question has been answered already and you should always use the interface. I however would just like to comment on
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList.
If you are returning a data structure that you know has poor random access performance -- O(n) and typically a LOT of data -- there are other interfaces you should be specifying instead of List, like Iterable so that anyone using the library will be fully aware that only sequential access is available.
Picking the right type to return isn't just about interface versus concrete implementation, it is also about selecting the right interface.
You use interface to abstract away from the actual implementation. The interface is basically just a blueprint for what your implementation can do.
Interfaces are good design because they allow you to change implementation details without having to fear that any of its consumers are directly affected, as long as you implementation still does what your interface says it does.
To work with interfaces you would instantiate them like this:
IParser parser = new Parser();
Now IParser would be your interface, and Parser would be your implementation. Now when you work with the parser object from above, you will work against the interface (IParser), which in turn will work against your implementation (Parser).
That means that you can change the inner workings of Parser as much as you want, it will never affect code that works against your IParser parser interface.
In general use the interface in all cases if you have no need of the functionality of the concrete class. Note that for lists, Java has added a RandomAccess marker class primarily to distinguish a common case where an algorithm may need to know if get(i) is constant time or not.
For uses of code, Michael above is right that being as generic as possible in the method parameters is often even more important. This is especially true when testing such a method.
You'll find (or have found) that as you return interfaces, they permeate through your code. e.g. you return an interface from method A and you have to then pass an interface to method B.
What you're doing is programming by contract, albeit in a limited fashion.
This gives you enormous scope to change implementations under the covers (provided these new objects fulfill the existing contracts/expected behaviours).
Given all of this, you have benefits in terms of choosing your implementation, and how you can substitute behaviours (including testing - using mocking, for example). In case you hadn't guessed, I'm all in favour of this and try to reduce to (or introduce) interfaces wherever possible.
I have a situation where multiple classes can extend one core class. There's a property (say, isAtomic), which in some subclasses will depend on the data but in other subclasses will constantly be true. I'd like, if possible, the check to be as inexpensive as possible in the latter case. I can think of several approaches:
The obvious: abstract bool isAtomic() which would be implemented to return true/false in the "constant" subclasses,
A subclass Atomic extends Base which would implement isAtomic() as return true; and which the other "constant" subclasses would extend (probably not too much different from 1),
The base class implementing isAtomic as return true; and the non-constant subclasses overriding this definition,
An interface Atomic implemented by the atomic subclasses which would obviously not return anything but would be checked against using x instanceof Atomic || x.isAtomic().
I'm not too experienced in OOP but something tells me that the last approach represents the concept nicely and saves a function call, even though the more complicated usage. But I don't know whether an instanceof is indeed "cheaper" than calling a function. 3 should be quicker than 1 or 2 but feels against the nature of the isAtomic function. What do the experts say? Or is there another intuitive way, better than those listed, I didn't think of?
Go with your first choice. No one here can tell which of the options is the best for your particular case. If the first choice turns out to be a bad one and you have to revisit it, you'll have learned something.
As for efficiency, the JVM may inline calls to short methods. This can happen even if you are calling an abstract method, especially when the call site is monomorphic. Performance optimization is very difficult, so don't worry about it until you know you are going to have a problem.
I have a situation where multiple classes can extend one core class.
I know nothing about your program, but may want to avoid situations like this.
Is there any reasons/arguments not to implement a Java collection that restricts its members based on a predicate/constraint?
Given that such functionality should be necessary often, I was expecting it to be implemented already on collections frameworks like apache-commons or Guava. But while apache indeed had it, Guava deprecated its version of it and recommend not using similar approaches.
The Collection interface contract states that a collection may place any restrictions on its elements as long as it is properly documented, so I'm unable to see why a guarded collection would be discouraged. What other option is there to, say, ensure a Integer collection never contains negative values without hiding the whole collection?
It is just a matter of preference -look at thread about checking before vs checking after - I think that is what it boils down to. Also checking only on add() i good enough only for immutable objects.
There can hardly be one ("acceptable") answer, so I'll just add some thoughts:
As mentioned in the comments, the Collection#add(E) already allows for throwing an IllegalArgumentException, with the reason
if some property of the element prevents it from being added to this collection
So one could say that this case was explicitly considered in the design of the collection interface, and there is no obvious, profound, purely technical (interface-contract related) reason to not allow creating such a collection.
However, when thinking about possible application patterns, one quickly finds cases where the observed behavior of such a collection could be ... counterintuitive, to say the least.
One was already mentioned by dcsohl in the comments, and referred to cases where such a collection would only be a view on another collection:
List<Integer> listWithIntegers = new ArrayList<Integer>();
List<Integer> listWithPositiveIntegers =
createView(listWithIntegers, e -> e > 0);
//listWithPositiveIntegers.add(-1); // Would throw IllegalArgumentException
listWithIntegers.add(-1); // Fine
// This would be true:
assert(listWithPositiveIntegers.contains(-1));
However, one could argue that
Such a collection would not necessarily have to be only a view. Instead, one could enforce that only new collections with such constraints may be created
The behavior is similar to that of Collections.unmodifiableCollection(Collection), which is widely anticipated as it is. (Although it serves a far broader and omnipresent use-case, namely avoiding the internal state of a class to be exposed by returning a modifiable version of a collection via an accessor method)
But in this case, the potential for "inconsistencies" is much higher.
For example, consider a call to Collection#addAll(Collection). It also allows throwing an IllegalArgumentException "if some property of an element of the specified collection prevents it from being added to this collection". But there are no guarantees about things like atomicity. To phrase it that way: It is not specified what the state of the collection will be when such an exception was thrown. Imagine a case like this:
List<Integer> listWithPositiveIntegers = createList(e -> e > 0);
listWithPositiveIntegers.add(1); // Fine
listWithPositiveIntegers.add(2); // Fine
listWithPositiveIntegers.add(Arrays.asList(3,-4,5)); // Throws
assert(listWithPositiveIntegers.contains(3)); // True or false?
assert(listWithPositiveIntegers.contains(5)); // True or false?
(It may be subtle, but it may be an issue).
All this might become even trickier when the condition changes after the collection has been created (regardless of whether it is only a view or not). For example, one could imagine a sequence of calls like this:
List<Integer> listWithPredicate = create(predicate);
listWithPredicate.add(-1); // Fine
someMethod();
listWithPredicate.add(-1); // Throws
Where in someMethod(), there is an innocent line like
predicate.setForbiddingNegatives(true);
One of the comments already mentioned possible performance issues. This is certainly true, but I think that this is not really a strong technical argument: There are no formal complexity guarantees for the runtime of any method of the Collection interface, anyhow. You don't know how long a collection.add(e) call takes. For a LinkedList it is O(1), but for a TreeSet it may be O(n log n) (and who knows what n is at this point in time).
Maybe the performance issue and the possible inconsistencies can be considered as special cases of a more general statement:
Such a collection would allow to basically execute arbitrary code during many operations - depending on the implementation of the predicate.
This may literally have arbitrary implications, and makes reasoning about algorithms, performance and the exact behavior (in terms of consistency) impossible.
The bottom line is: There are many possible reasons to not use such a collection. But I can't think of a strong and general technical reason. So there may be application cases for such a collection, but the caveats should be kept in mind, considering how exactly such a collection is intended to be used.
I would say that such a collection would have too many responsibilities and violate SRP.
The main issue I see here is the readability and maintainability of the code that uses the collection. Suppose you have a collection to which you allow adding only positive integers (Collection<Integer>) and you use it throughout the code. Then the requirements change and you are only allowed to add odd positive integers to it. Because there are no compile time checks, it would be much harder for you to find all the occurrences in the code where you add elements to that collection than it would be if you had a separate wrapper class which encapsulates the collection.
Although of course not even close to such an extreme, it bears some resemblance to using Object reference for all objects in the application.
The better approach is to utilize compile time checks and follow the well-established OOP principles like type safety and encapsulation. That means creating a separate wrapper class or creating a separate type for collection elements.
For example, if you really want to make quite sure that you only work with positive integers in a context, you could create a separate type PositiveInteger extends Number and then add them to a Collection<PositiveInteger>. This way you get compile time safety and converting PositiveInteger to OddPositiveInteger requires much less effort.
Enums are an excellent example of preferring dedicated types vs runtime-constrained values (constant strings or integers).
when programming in Java I practically always, just out of habit, write something like this:
public List<String> foo() {
return new ArrayList<String>();
}
Most of the time without even thinking about it. Now, the question is: should I always specify the interface as the return type? Or is it advisable to use the actual implementation of the interface, and if so, under what circumstances?
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList:
List bar = foo();
List myList = bar instanceof LinkedList ? new ArrayList(bar) : bar;
but that just seems horrible and my coworkers would probably lynch me in the cafeteria. And rightfully so.
What do you guys think? What are your guidelines, when do you tend towards the abstract solution, and when do you reveal details of your implementation for potential performance gains?
Return the appropriate interface to hide implementation details. Your clients should only care about what your object offers, not how you implemented it. If you start with a private ArrayList, and decide later on that something else (e.g., LinkedLisk, skip list, etc.) is more appropriate you can change the implementation without affecting clients if you return the interface. The moment you return a concrete type the opportunity is lost.
For instance, if I know that I will
primarily access the data in the list
randomly, a LinkedList would be bad.
But if my library function only
returns the interface, I simply don't
know. To be on the safe side I might
even need to copy the list explicitly
over to an ArrayList.
As everybody else has mentioned, you just mustn't care about how the library has implemented the functionality, to reduce coupling and increasing maintainability of the library.
If you, as a library client, can demonstrate that the implementation is performing badly for your use case, you can then contact the person in charge and discuss about the best path to follow (a new method for this case or just changing the implementation).
That said, your example reeks of premature optimization.
If the method is or can be critical, it might mention the implementation details in the documentation.
Without being able to justify it with reams of CS quotes (I'm self taught), I've always gone by the mantra of "Accept the least derived, return the most derived," when designing classes and it has stood me well over the years.
I guess that means in terms of interface versus concrete return is that if you are trying to reduce dependencies and/or decouple, returning the interface is generally more useful. However, if the concrete class implements more than that interface, it is usually more useful to the callers of your method to get the concrete class back (i.e. the "most derived") rather than aribtrarily restrict them to a subset of that returned object's functionality - unless you actually need to restrict them. Then again, you could also just increase the coverage of the interface. Needless restrictions like this I compare to thoughtless sealing of classes; you never know. Just to talk a bit about the former part of that mantra (for other readers), accepting the least derived also gives maximum flexibility for callers of your method.
-Oisin
Sorry to disagree, but I think the basic rule is as follows:
For input arguments use the most generic.
For output values, the most specific.
So, in this case you want to declare the implementation as:
public ArrayList<String> foo() {
return new ArrayList<String>();
}
Rationale:
The input case is already known and explained by everyone: use the interface, period. However, the output case can look counter-intuitive.
You want to return the implementation because you want the client to have the most information about what is receiving. In this case, more knowledge is more power.
Example 1: the client wants to get the 5th element:
return Collection: must iterate until 5th element vs return List:
return List: list.get(4)
Example 2: the client wants to remove the 5th element:
return List: must create a new list without the specified element (list.remove() is optional).
return ArrayList: arrayList.remove(4)
So it's a big truth that using interfaces is great because it promotes reusability, reduces coupling, improves maintainability and makes people happy ... but only when used as input.
So, again, the rule can be stated as:
Be flexible for what you offer.
Be informative with what you deliver.
So, next time, please return the implementation.
In OO programming, we want to encapsulate as much as possible the data. Hide as much as possible the actual implementation, abstracting the types as high as possible.
In this context, I would answer only return what is meaningful. Does it makes sense at all for the return value to be the concrete class? Aka in your example, ask yourself: will anyone use a LinkedList-specific method on the return value of foo?
If no, just use the higher-level Interface. It's much more flexible, and allows you to change the backend
If yes, ask yourself: can't I refactor my code to return the higher-level interface? :)
The more abstract is your code, the less changes your are required to do when changing a backend. It's as simple as that.
If, on the other hand, you end up casting the return values to the concrete class, well that's a strong sign that you should probably return instead the concrete class. Your users/teammates should not have to know about more or less implicit contracts: if you need to use the concrete methods, just return the concrete class, for clarity.
In a nutshell: code abstract, but explicitly :)
In general, for a public facing interface such as APIs, returning the interface (such as List) over the concrete implementation (such as ArrayList) would be better.
The use of a ArrayList or LinkedList is an implementation detail of the library that should be considered for the most common use case of that library. And of course, internally, having private methods handing off LinkedLists wouldn't necessarily be a bad thing, if it provides facilities that would make the processing easier.
There is no reason that a concrete class shouldn't be used in the implementation, unless there is a good reason to believe that some other List class would be used later on. But then again, changing the implementation details shouldn't be as painful as long as the public facing portion is well-designed.
The library itself should be a black box to its consumers, so they don't really have to worry about what's going on internally. That also means that the library should be designed so that it is designed to be used in the way it is intended.
It doesn't matter all that much whether an API method returns an interface or a concrete class; despite what everyone here says, you almost never change the implementiation class once the code is written.
What's far more important: always use minimum-scope interfaces for your method parameters! That way, clients have maximal freedom and can use classes your code doesn't even know about.
When an API method returns ArrayList, I have absolutely no qualms with that, but when it demands an ArrayList (or, all to common, Vector) parameter, I consider hunting down the programmer and hurting him, because it means that I can't use Arrays.asList(), Collections.singletonList() or Collections.EMPTY_LIST.
As a rule, I only pass back internal implementations if I am in some private, inner workings of a library, and even so only sparingly. For everything that is public and likely to be called from the outside of my module I use interfaces, and also the Factory pattern.
Using interfaces in such a way has proven to be a very reliable way to write reusable code.
The main question has been answered already and you should always use the interface. I however would just like to comment on
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList.
If you are returning a data structure that you know has poor random access performance -- O(n) and typically a LOT of data -- there are other interfaces you should be specifying instead of List, like Iterable so that anyone using the library will be fully aware that only sequential access is available.
Picking the right type to return isn't just about interface versus concrete implementation, it is also about selecting the right interface.
You use interface to abstract away from the actual implementation. The interface is basically just a blueprint for what your implementation can do.
Interfaces are good design because they allow you to change implementation details without having to fear that any of its consumers are directly affected, as long as you implementation still does what your interface says it does.
To work with interfaces you would instantiate them like this:
IParser parser = new Parser();
Now IParser would be your interface, and Parser would be your implementation. Now when you work with the parser object from above, you will work against the interface (IParser), which in turn will work against your implementation (Parser).
That means that you can change the inner workings of Parser as much as you want, it will never affect code that works against your IParser parser interface.
In general use the interface in all cases if you have no need of the functionality of the concrete class. Note that for lists, Java has added a RandomAccess marker class primarily to distinguish a common case where an algorithm may need to know if get(i) is constant time or not.
For uses of code, Michael above is right that being as generic as possible in the method parameters is often even more important. This is especially true when testing such a method.
You'll find (or have found) that as you return interfaces, they permeate through your code. e.g. you return an interface from method A and you have to then pass an interface to method B.
What you're doing is programming by contract, albeit in a limited fashion.
This gives you enormous scope to change implementations under the covers (provided these new objects fulfill the existing contracts/expected behaviours).
Given all of this, you have benefits in terms of choosing your implementation, and how you can substitute behaviours (including testing - using mocking, for example). In case you hadn't guessed, I'm all in favour of this and try to reduce to (or introduce) interfaces wherever possible.