Consider this code snippet:
class MyClass{
private List myList;
//...
public List getList(){
return myList;
}
}
As Java passes object references by value, my understanding is that any object calling getList() will obtain a reference to myList, allowing it to modify myList despite it being private. Is that correct?
And, if it is correct, should I be using
return new LinkedList(myList);
to create a copy and pass back a reference to the copy, rather than the original, in order to prevent unauthorised access to the list referenced bymyList?
I do that. Better yet, sometimes I return an unmodifiable copy using the Collections API.
If you don't, your reference is not private. Anyone that has a reference can alter your private state. Same holds true for any mutable reference (e.g., Date).
It depends on what you want.
Do you want to expose the list and make it so people can edit it?
Or do you want to let people look at it, but not modify it?
There is no right or wrong way in this case. It just depends on your design needs.
There can be some cases when one would want to return the "raw" list to the caller. But in general, i think that it is a bad practice as it breaks the encapsulation and therefore is against OO.
If you must return the "raw" list and not a copy then it should be explicitly clear to the users of MyClass.
Yes, and it has a name.. "Defensive copy". Copying at the receiving end is also recommended. As Tom has noted, behavior of the program is much easier to predict if the collection is immutable. So unless you have a very good reason, you should use an immutable collection.
When Google Guava becomes part of the Java standard library (I totally think it should), this would probably become the preferred idiom:
return ImmutableList.copyOf(someList);
and
void (List someList){
someList = ImmutableList.copyOf(someList);
This has an added bonus of performance, because the copyOf() method checks whether the collection is already an instance of immutable collection (instanceof ImmutableList) and if so, skips the copying.
I think that the pattern of making fields private and providing accessors is simply meant for data encapsulation. If you want something to be truly private, don't give it accessor methods! You can then write other methods that return immutable versions of your private data or copies thereof.
Related
Suppose I have a private ArrayList or a LinkedList inside a class, that I will never assign new reference to it, or in other words this will never happen:
myLinkedList = anotherLinkedList;
So that I won't need to use setMyLinkedList(anotherLinkedList).
But! I need to add elements to it, or remove elements from it.
Should I write a new kind of setter to only, do the task of adding instead of setting, like myLinkedList.add(someElement)?
Or it is OK to do this by using getter, without disobeying Encapsulation principal?
getMyLinkedList().add(someElement)
( + Suppose I am going to lose my mark if I disobey encapsulation :-")
I don't think it a particularly great practice to do something like:
myObj.getMyList().add(x);
since you are exposing a private class variable in a non read only way, but that being said I do see it pretty frequently(I'm looking at you, auto generated classes). I would argue that instead of doing it that way, return an unmodifiable list and allow users of the class to add to the list via an explicit method:
public class MyClass{
private final List<String> myList = new ArrayList<String>();
public List<String> getList(){
return Collections.unmodifiableList(this.myList);
}
public void addToList(final String s){
this.myList.add(s);
}
}
EDIT After reviewing your comments, I wanted to add a bit about your setter idea:
I meant using that line of code inside a new kind of setter inside the class itself, like public void setter(someElement){this.myLinkedList.add(someElement);}
If I'm understanding you correctly, you are saying you want to expose a method that only adds to your list. Overall this is what I think you should be shooting for, and what many have outlined in the answers, however, labeling it as a setter is a bit misleading since you are not reassigning (setting) anything. That, and I strongly recommend returning a read only list from your getter method if possible.
I would suggest in this case it would be best to follow your Encapsulation principals and use a method for adding elements to the list. You have restricted access to your list by making it private so that other classes cannot directly access the datatype.
Let the class that stores your ArrayList have direct access to the list, but when other classes want to add to the list, use an add() method.
In general, you should not assume that the list being returned by the getter is the original one. It could be decorated or proxied for example.
If you want to prevent that a new list is set on the target object, you could define an add method on the target class instead.
As soon as you have a Collection of any kind, it is generally not a bad idea to add methods like add(), remove() to the interface of your class if it makes sense that clients can add or remove objects from your private list.
The reason why it is useful to have these extra methods implemented (it might seem like overkill, because after all those methods mostly just call the method on the Collection) is that you protect evil clients from doing things to your list you don't want them to do, because the interface of most Collections contain more than just the add() and remove() methods and mostly, you don't want clients to be messing around with things you can't control. Therefore the encapsulation principle is that important to your teacher.
Another plus: if at any time, you would decide that a certain condition must be met when an object is added to your list, this can easily be implemented in the method you already have. If you give a client access to the direct reference of your list, it is not easy at all to implement this kind of things (which are not rare).
Hope this helps
So you have a class containing a List field (it should be final, since you don't intend to assign to it), and you want to allow callers to add to the List, but not be able to replace it.
You could either provide a getter for the list:
public List<E> getMyList() {
return myList;
}
Or provide a method to add to that list:
public void addToMyList(E e) {
myList.add(e);
}
Both are valid design decisions, but which you use will depend on your use case. The first option gives callers direct access to the List, effectively making it public. This is useful when users will be modifying and working with the list repeatedly, but can be problematic as you can no longer trust the List is in any sort of reliable state (the caller could empty it, or reorder it, or even maliciously insert objects of a different type). So the first option should only be used when you intend to trust the caller.
The second option gives the caller less power, because they can only add one element at a time. If you want to provide additional features (insertion, add-all, etc.) you'll have to wrap each operation in turn. But it gives you more confidence, since you can be certain the List is only being modified in ways you approve of. This latter option also hides (encapsulates) the implementation detail that you're using a List at all, so if encapsulation is important for your use case, you want to go this way to avoid exposing your internal data structures, and only expose the behavior you want to grant to callers.
It depends on the application - both are acceptable. Take a good look at the class you're writing and decide if you want to allow users to directly access the contents of the list, or if you would prefer that they go through some intermediate process first.
For example, say you have a class ListEncrypter which holds your MyLinkedList list. The purpose of this class is to encrypt anything that is stored in MyLinkedList. In this case, you'd want to provide a custom add method in order to process the added item before placing it in the list, and if you want to access the element, you'd also process it:
public void add(Object element)
{
MyLinkedList.add(encrypt(element););
}
public Object get(int index)
{
return decrypt(MyLinkedList.get(index););
}
In this case, you clearly want to deny the user's access to the MyLinkedList variable, since the contents will be encrypted and they won't be able to do anything with it.
On the other hand, if you're not really doing any processing of the data (and you're sure you won't ever need to in the future), you can skip creating the specialized methods and just allow the user to directly access the list via the get method.
The question is framed for List but easily applies to others in the java collections framework.
For example, I would say it is certainly appropriate to make a new List sub-type to store something like a counter of additions since it is an integral part of the list's operation and doesn't alter that it "is a list". Something like this:
public class ChangeTrackingList<E> extends ArrayList<E> {
private int changeCount;
...
#Override public boolean add(E e) {
changeCount++;
return super.add(e);
}
// other methods likewise overridden as appropriate to track change counter
}
However, what about adding additional functionality out of the knowledge of a list ADT, such as storing arbitrary data associated with a list element? Assuming the associated data was properly managed when elements are added and removed, of course. Something like this:
public class PayloadList<E> extends ArrayList<E> {
private Object[] payload;
...
public void setData(int index, Object data) {
... // manage 'payload' array
payload[index] = data;
}
public Object getData(int index) {
... // manage 'payload' array, error handling, etc.
return payload[index];
}
}
In this case I have altered that it is "just a list" by adding not only additional functionality (behavior) but also additional state. Certainly part of the purpose of type specification and inheritance, but is there an implicit restriction (taboo, deprecation, poor practice, etc.) on Java collections types to treat them specially?
For example, when referencing this PayloadList as a java.util.List, one will mutate and iterate as normal and ignore the payload. This is problematic when it comes to something like persistence or copying which does not expect a List to carry additional data to be maintained. I've seen many places that accept an object, check to see that it "is a list" and then simply treat it as java.util.List. Should they instead allow arbitrary application contributions to manage specific concrete sub-types?
Since this implementation would constantly produce issues in instance slicing (ignoring sub-type fields) is it a bad idea to extend a collection in this way and always use composition when there is additional data to be managed? Or is it instead the job of persistence or copying to account for all concrete sub-types including additional fields?
This is purely a matter of opinion, but personally I would advise against extending classes like ArrayList in almost all circumstances, and favour composition instead.
Even your ChangeTrackingList is rather dodgy. What does
list.addAll(Arrays.asList("foo", "bar"));
do? Does it increment changeCount twice, or not at all? It depends on whether ArrayList.addAll() uses add(), which is an implementation detail you should not have to worry about. You would also have to keep your methods in sync with the ArrayList methods. For example, at present addAll(Collection<?> collection) is implemented on top of add(), but if they decided in a future release to check first if collection instanceof ArrayList, and if so use System.arraycopy to directly copy the data, you would have to change your addAll() method to only increment changeCount by collection.size() if the collection is an ArrayList (otherwise it gets done in add()).
Also if a method is ever added to List (this happened with forEach() and stream() for example) this would cause problems if you were using that method name to mean something else. This can happen when extending abstract classes too, but at least an abstract class has no state, so you are less likely to be able to cause too much damage by overriding methods.
I would still use the List interface, and ideally extend AbstractList. Something like this
public final class PayloadList<E> extends AbstractList<E> implements RandomAccess {
private final ArrayList<E> list;
private final Object[] payload;
// details missing
}
That way you have a class that implements List and makes use of ArrayList without you having to worry about implementation details.
(By the way, in my opinion, the class AbstractList is amazing. You only have to override get() and size() to have a functioning List implementation and methods like containsAll(), toString() and stream() all just work.)
One aspect you should consider is that all classes that inherit from AbstractList are value classes. That means that they have meaningful equals(Object) and hashCode() methods, therefore two lists are judged to be equal even if they are not the same instance of any class.
Furthermore, the equals() contract from AbstractList allows any list to be compared with the current list - not just a list with the same implementation.
Now, if you add a value item to a value class when you extend it, you need to include that value item in the equals() and hashCode() methods. Otherwise you will allow two PayloadList lists with different payloads to be considered "the same" when somebody uses them in a map, a set, or just a plain equals() comparison in any algorithm.
But in fact, it's impossible to extend a value class by adding a value item! You'll end up breaking the equals() contract, either by breaking symmetry (A plain ArrayList containing [1,2,3] will return true when compared with a PayloadList containing [1,2,3] with a payload of [a,b,c], but the reverse comparison won't return true). Or you'll break transitivity.
This means that basically, the only proper way to extend a value class is by adding non-value behavior (e.g. a list that logs every add() and remove()). The only way to avoid breaking the contract is to use composition. And it has to be composition that does not implement List at all (because again, other lists will accept anything that implements List and gives the same values when iterating it).
This answer is based on item 8 of Effective Java, 2nd Edition, by Joshua Bloch
If the class is not final, you can always extend it. Everything else is subjective and a matter of opinion.
My opinion is to favor composition over inheritance, since in the long run, inheritance produces low cohesion and high coupling, which is the opposite of a good OO design. But this is just my opinion.
The following is all just opinion, the question invites opinionated answers (I think its borderline to not being approiate for SO).
While your approach is workable in some situations, I'd argue its bad design and it is very brittle. Its also pretty complicated to cover all loopholes (maybe even impossible, for example when the list is sorted by means other than List.sort).
If you require extra data to be stored and managed it would be better integrated into the list items themselves, or the data could be associated using existing collection types like Map.
If you really need an association list type, consider making it not an instance of java.util.List, but a specialized type with specialized API. That way no accidents are possible by passing it where a plain list is expected.
I encounter many times of similar code:
class AClass{
private Iterable<String> list;
public AClass(Iterable<String> list){ this.list = list; }
...
}
In this code, a reference of Iterable is passed to AClass directly. The end result is equivalent to directly expose list reference to outside. Even if you make AClass.list final, it still allows code from outside AClass to modify the content of the list, which is bad.
To counter this, we will do a defensive copy in the constructor.
However, this kind of code is very common. Besides performance consideration, what's the intension for people to write this kind of code?
I don't see anything wrong with that pattern. If the class represents objects that operate on a list (or an iterable) then it's natural to provide that list to the constructor. If your class can't handle changes to the underlying collection, then it needs to be fixed or documented. Making a copy of the collection is one way to fix that.
Another option is to change the interface so that only immutable collections are allowed:
public AClass(ImmutableList<MyObject> objects) {
this.objects = objects;
...
You would need some kind of ImmutableList-class or interface of course.
Depending on the use and users of your classes you could also avoid making copies by documenting the known "weakness":
/**
* ...
* #param objects list of objects this AClass-object operates on.
* The list should not be modified during the lifetime
* of this object
*/
public AClass(List<MyObject> objects) ...
Simple answer, if it is your own code/small team, it is often just quicker, easier and less memory and CPU intensive to do things this way. Also, some people just don't know any better!
You might want to take a look at the copy constructor for a familiar idiom.
Its always good practice to make a copy, not only because other people can then modify your values, but also for security reasons.
If the code is being used internally as is pointed by other answers it should not be a problem. But if you are exposing as an API then there are two options:
First is to create a defensive copy and then return it
Second would be to create a UnmodifiableCollection and then return it and document the fact that trying to change anything in the collection may result in exception.
But the first option is more preferable.
I have an object that stores some data in a list. The implementation could change later, and I don't want to expose the internal implementation to the end user. However, the user must have the ability to modify and access this collection of data. Currently I have something like this:
public List<SomeDataType> getData() {
return this.data;
}
public void setData(List<SomeDataType> data) {
this.data = data;
}
Does this mean that I have allowed the internal implementation details to leak out? Should I be doing this instead?
public Collection<SomeDataType> getData() {
return this.data;
}
public void setData(Collection<SomeDataType> data) {
this.data = new ArrayList<SomeDataType>(data);
}
It just depends, do you want your users to be able to index into the data? If yes, use List. Both are interfaces, so you're not leaking implementation details, really, you just need to decide the minimum functionality needed.
Returning a List is in line with programming to the Highest Suitable Interface.
Returning a Collection would cause ambiguity to the user, as a returned collection could be either: Set, List or Queue.
Independent of the ability to index into the list via List.get(int), do the users (or you) have an expectation that the elements of the collection are in a reliable and predictable order? Can the collection have multiples of the same item? Both of these are expectations of lists that are not common to more general collections. These are the tests I use when determining which abstraction to expose to the end user.
When returning an implementation of an interface or class that is in a tall hierarchy, the rule of thumb is that the declared return type should be the HIGHEST level that provides the minimum functionality that you are prepared to guarantee to the caller, and that the caller reasonably needs. For example, suppose what you really return is an ArrayList. ArrayList implements List and Collection (among other things). If you expect the caller to need to use the get(int x) function, then it won't work to return a Collection, you'll need to return a List or ArrayList. As long as you don't see any reason why you would ever change your implementation to use something other than a list -- say a Set -- then the right answer is to return a List. I'm not sure if there's any function in ArrayList that isn't in List, but if there is, the same reasoning would apply. On the other hand, once you do return a List instead of a Collection, you have now locked in your implementation to some extent. The less you put in your API, the less restriction you put on future improvements.
(In practice, I almost always return a List in such situations, and it has never burned me. But I probably really should return a Collection.)
Using the most general type, which is Collection, makes the most sense unless there is some explicit reason to use the more specific type - List. But whatever you do, if this is an API for public consumption be clear in the documentation what it does; if it returns a shallow copy of the collection say so.
Yes, your first alternative does leak implementation details if it's not part of your interface contract that the method will always return a List. Also, allowing user code to replace your collection instance is somewhat dangerous, because the implementation they pass in may not behave as you expect.
Of course, it's all a matter of how much you trust your users. If you take the Python philosophy that "we're all consenting adults here" then the first method is just fine. If you think that your library will be used by inexperienced developers and you need to do all you can to "babysit" them and make sure they don't do something wrong then it's preferable not to let them set the collection and not to even return the actual collection. Instead return a (shallow) copy of it.
It depends on what guarantees you want to provide the user. If the data is sequential such that the order of the elements matter and you are allowing duplicates, then use a list. If order of elements does not matter and duplicates may or may not be allowed, then use a collection. Since you are actually returning the underlying collection you should not have both a get and set function, only a get function, since the returned collection may be mutated. Also, providing a set function allows the type of collection to be changed by the user, whereas you probably want for the particular type to be controlled by you.
Were I concerned with obscuring internal representation of my data to an outside user, I would use either XML or JSON. Either way, they're fairly universal.
In Java, say you have a class that wraps an ArrayList (or any collection) of objects.
How would you return one of those objects such that the caller will not see any future changes to the object made in the ArrayList?
i.e. you want to return a deep copy of the object, but you don't know if it is cloneable.
Turn that into a spec:
-that objects need to implement an interface in order to be allowed into the collection
Something like ArrayList<ICloneable>()
Then you can be assured that you always do a deep copy - the interface should have a method that is guaranteed to return a deep copy.
I think that's the best you can do.
One option is to use serialization. Here's a blog post explaining it:
http://weblogs.java.net/blog/emcmanus/archive/2007/04/cloning_java_ob.html
I suppose it is an ovbious answer:
Make a requisite for the classes stored in the collection to be cloneable. You could check that at insertion time or at retrieval time, whatever makes more sense, and throw an exception.
Or if the item is not cloneable, just fail back to the return by reference option.