Understanding polymorphism in Java - java

I haven't written any Java code in more than 10 years. I'm enjoying it, but I don't think I get some of the details of polymorphic programming. I have an abstract Node class, which has tag and data subclasses (among others), and I store them in an ArrayList.
But when I get them out of the ArrayList via an Iterator, I get Node objects back. I'm not sure how best to deal with the objects I get back from the iterator.
Here's an example:
// initialize the list
TagNode tag = new TagNode();
ArrayList<Node> list = new ArrayList<>();
list.add(tag);
// And many more go into the list, some TagNodes, some DataNodes, etc.
and later I use an iterator to process them:
Iterator<Node> i = list.iterator();
Node n = i.next();
// How do I tell if n is a TagNode or a DataNode?
I know that I can cast to one of the Node subclasses, but how do I know which subclass to use? Do I need to embed type information inside the Node classes?

You should not need to know which child class is which, in most circumstances.
That is precisely the advantage with polymorphism.
If your hierarchical design is solid, the Node will have all the behaviors (== methods) needed to perform operations on your List items without worrying about which child class they are an instance of: overridden methods resolve at runtime.
In some cases, you might want to use the instanceof operator to actually check which child class your Node belongs to, but I would consider it a rare case, best to be avoided in general principles.

Ideally you don't want to treat them differently, but if you wanted to determine the type, you can check using instanceof:
Node n = i.next();
if (n instanceof Tag) {
// behavior
}

As others said, checking explicitly which subclass your object belongs to should not be necessary and is also a bad style. But if you really need it, you can use instanceof operator.

The polymorphic behavior would mean that you don't really care if you know what type of Node is it. You just simply call the API and the behavior will as per the implementation of concrete type.
But if you really need to know, one of the ways is to use instanceof to know which is the exact type. For e.g.
if( i instanceof tag )
// handle tag
else if( i instanceof data)
//handle data

Related

Should we use the topmost parent class as a type of reference variable?

I have seen some people using the topmost parent class as a variable type to hold the child instance and some people use just parent class only. For example:
Collection obj = new ArrayList();
Or
List obj = new ArrayList();
Here, List comes under the Collection only then can’t we use above first line instead of second?
Again, we can't use everywhere in collection framework the reference variable of Collection class only to hold any instance of the class under Collection?
Is this a good practice?
So, I wanted to know which comes under the best practices and why?
If someone could justify technically like performance concerns etc. would be greatly appreciated.
It really really depends on your needs. In your example it doesn't really changes much for basic needs but if you inspect the two interfaces there are some changes. Look :
https://docs.oracle.com/javase/7/docs/api/java/util/Collection.html
and
https://docs.oracle.com/javase/7/docs/api/java/util/List.html
We can notice that the List gives you access to methods Collection doesn't.
set(int index, E element) for instance is defined in the List interface and not in Collection.
This is because every classes inheriting from Collection don't need to implement all the same methods.
Performance wise it have no impact.
Always use the top-most parent class that have all the functionalities you need. For your example there is no need to go higher than List .
There is no so called "best practice" for choosing the class to be used for the reference type. In fact, the class in the highest hierarchy is the Object class. Do you use Object as the reference type for everything you do? No, but generally you may choose the higher class with the all the methods available for your needs.
Instead of following the so called "best practice", apply what suits best for your situation.
These are some pros and cons for using higher hierarchy classes as reference type:
Advantage
Allows grouping of object which shares the same ancestor (super class)
Allows all instances of the given class to be assigned to it
Animal dog = new Dog();
Animal cat = new Cat();
Allows polymorphism
dog.makeNoise();
cat.makeNoise();
It is only an advantage when you are accessing common behaviours or members.
Disadvantage
Requires casting when you are accessing behaviours which exist in one object but not the other.
dog.swim(); //error, class Animal do not have swim()
((Dog)dog).swim();
As you start dumping various objects in the common parent class, you may have a hard time trying to figure out which members belongs to which class.
(Cat(cat)).swim(); //error, class Cat do not have swim()
The general idea is hiding as much as you can so things are easier to change. If you need indexing for instance (List.get(int index) then it MUST be a list because a collection does not support .get(index). If you don't need indexing, then hiding the fact you're using a list, means you can switch to other collections that might not be a list later without any trouble.
For example, maybe one month later I want to use a set instead of list. But Set doesn't support .get(index). So anybody who uses this List might use the indexing features of a list and it would make it difficult to switch to a set because every where someone else used .get(), would break.
On the other hand, excessively hiding your types can cause accidental performance issues because a consumer of your method didn't know the type. Suppose you return a List that's actually a linkedlist (where indexing is O(n)). Suppose the consumer of this list does a lookup for each entry in another list. That can be O(n*m) performance which is really slow. If you advertised that it was a linked list in the first place, the consumer of the linkedlist would realize that it's probably not a good idea to make multiple indexes into this list and the consumer can make a local copy.
Library code (suppose the one you're designing)
public class Util {
public static List<String> makeData() {
return new LinkedList(Arrays.asList("dogs", "cats", "zebras", "deer"));
}
}
Caller's code (suppose the one that's using your library or method)
public static void main(String [] args) {
List<String> data = Util.makeData();
int [] indicesToLookUp = {1,4,2,3,0};
for( int idx : indicesToLookUp ) {
if(idx < data.size()) {
// each index (LinkedList.get()) is slow O(N)
doSomethingWithEntry(idx, list.get(idx));
}
}
}
You could argue it's the caller's fault because he incorrectly assumed the List is an ArrayList<> and should have made a local copy of the list.

Is it appropriate to extend List to add fields

The question is framed for List but easily applies to others in the java collections framework.
For example, I would say it is certainly appropriate to make a new List sub-type to store something like a counter of additions since it is an integral part of the list's operation and doesn't alter that it "is a list". Something like this:
public class ChangeTrackingList<E> extends ArrayList<E> {
private int changeCount;
...
#Override public boolean add(E e) {
changeCount++;
return super.add(e);
}
// other methods likewise overridden as appropriate to track change counter
}
However, what about adding additional functionality out of the knowledge of a list ADT, such as storing arbitrary data associated with a list element? Assuming the associated data was properly managed when elements are added and removed, of course. Something like this:
public class PayloadList<E> extends ArrayList<E> {
private Object[] payload;
...
public void setData(int index, Object data) {
... // manage 'payload' array
payload[index] = data;
}
public Object getData(int index) {
... // manage 'payload' array, error handling, etc.
return payload[index];
}
}
In this case I have altered that it is "just a list" by adding not only additional functionality (behavior) but also additional state. Certainly part of the purpose of type specification and inheritance, but is there an implicit restriction (taboo, deprecation, poor practice, etc.) on Java collections types to treat them specially?
For example, when referencing this PayloadList as a java.util.List, one will mutate and iterate as normal and ignore the payload. This is problematic when it comes to something like persistence or copying which does not expect a List to carry additional data to be maintained. I've seen many places that accept an object, check to see that it "is a list" and then simply treat it as java.util.List. Should they instead allow arbitrary application contributions to manage specific concrete sub-types?
Since this implementation would constantly produce issues in instance slicing (ignoring sub-type fields) is it a bad idea to extend a collection in this way and always use composition when there is additional data to be managed? Or is it instead the job of persistence or copying to account for all concrete sub-types including additional fields?
This is purely a matter of opinion, but personally I would advise against extending classes like ArrayList in almost all circumstances, and favour composition instead.
Even your ChangeTrackingList is rather dodgy. What does
list.addAll(Arrays.asList("foo", "bar"));
do? Does it increment changeCount twice, or not at all? It depends on whether ArrayList.addAll() uses add(), which is an implementation detail you should not have to worry about. You would also have to keep your methods in sync with the ArrayList methods. For example, at present addAll(Collection<?> collection) is implemented on top of add(), but if they decided in a future release to check first if collection instanceof ArrayList, and if so use System.arraycopy to directly copy the data, you would have to change your addAll() method to only increment changeCount by collection.size() if the collection is an ArrayList (otherwise it gets done in add()).
Also if a method is ever added to List (this happened with forEach() and stream() for example) this would cause problems if you were using that method name to mean something else. This can happen when extending abstract classes too, but at least an abstract class has no state, so you are less likely to be able to cause too much damage by overriding methods.
I would still use the List interface, and ideally extend AbstractList. Something like this
public final class PayloadList<E> extends AbstractList<E> implements RandomAccess {
private final ArrayList<E> list;
private final Object[] payload;
// details missing
}
That way you have a class that implements List and makes use of ArrayList without you having to worry about implementation details.
(By the way, in my opinion, the class AbstractList is amazing. You only have to override get() and size() to have a functioning List implementation and methods like containsAll(), toString() and stream() all just work.)
One aspect you should consider is that all classes that inherit from AbstractList are value classes. That means that they have meaningful equals(Object) and hashCode() methods, therefore two lists are judged to be equal even if they are not the same instance of any class.
Furthermore, the equals() contract from AbstractList allows any list to be compared with the current list - not just a list with the same implementation.
Now, if you add a value item to a value class when you extend it, you need to include that value item in the equals() and hashCode() methods. Otherwise you will allow two PayloadList lists with different payloads to be considered "the same" when somebody uses them in a map, a set, or just a plain equals() comparison in any algorithm.
But in fact, it's impossible to extend a value class by adding a value item! You'll end up breaking the equals() contract, either by breaking symmetry (A plain ArrayList containing [1,2,3] will return true when compared with a PayloadList containing [1,2,3] with a payload of [a,b,c], but the reverse comparison won't return true). Or you'll break transitivity.
This means that basically, the only proper way to extend a value class is by adding non-value behavior (e.g. a list that logs every add() and remove()). The only way to avoid breaking the contract is to use composition. And it has to be composition that does not implement List at all (because again, other lists will accept anything that implements List and gives the same values when iterating it).
This answer is based on item 8 of Effective Java, 2nd Edition, by Joshua Bloch
If the class is not final, you can always extend it. Everything else is subjective and a matter of opinion.
My opinion is to favor composition over inheritance, since in the long run, inheritance produces low cohesion and high coupling, which is the opposite of a good OO design. But this is just my opinion.
The following is all just opinion, the question invites opinionated answers (I think its borderline to not being approiate for SO).
While your approach is workable in some situations, I'd argue its bad design and it is very brittle. Its also pretty complicated to cover all loopholes (maybe even impossible, for example when the list is sorted by means other than List.sort).
If you require extra data to be stored and managed it would be better integrated into the list items themselves, or the data could be associated using existing collection types like Map.
If you really need an association list type, consider making it not an instance of java.util.List, but a specialized type with specialized API. That way no accidents are possible by passing it where a plain list is expected.

List usage without specifying the Type

I see a code in the new environment. It is as follows:
List results;
if (<Some Condition>) {
results = List<XYZ> results;
} else {
results = List<ABC> results;
}
XYZ and ABC are Hibernate Entities.
Though this works, I guess this is not a proper way to do this.
I would like to know whats the better way to do it. I know there is no "perfect" way to do it. But this can be better.
Remember these are non-similar Entities. So I think wrapping these Entities with an Interface might not be a good idea.
Generics are a compile-time mechanism, so, if you don't know the type of object you are pulling, generics are not appropriate.
I understand that the entities are different and not correlated, but I don't understand why an interface is not a good idea. Basically, you know that you want to collect some data, according to some condition. So, just for the fact that XYZ and ABC are candidates to be type of the collected data, you do have some commonalities. In that case, you may have a
List<? extends CommonInterface>
and CommonInterface is used just here.
However, assuming XYZ and ABC are completely distinct, one more option could be to split the method in two parts and use a generic method receiving also the type of data you want to collect:
public void methodForTheCondition() {
if (<some condition>) {
List<XYZ> l = genericMethod(XYZ.class);
// do something
} else {
List<ABC> l = genericMethod(ABC.class);
// do something else, which I assume is different, otherwise opt for
// a common interface
}
}
public <T> List<T> genericMethod(Class<T> clazz) {
List<T> result = new ArrayList<T>();
return result;
}
But this can be better.
What makes you believe this? Without knowing the exact condition, this looks simply like a mass-loading of items in a generic EntityManager and therefore returning a List<X> whatever X might be.
From the code point of view, there is nothing wrong, because you are creating a untypted List and assigning a List of a certain type to that variable later...
As long as you use List as a raw-type, you are able to assign any List to it. This is what interfaces are designed for (Assigning a type without knowing the exact type...)
Remember these are non-similar Entities. So I think wrapping these Entities with an Interface might not be a good idea.
There are a lot of Interfaces out there that makes perfect sence for non-similar Items. Starting with anything that Aggregates elements (List, Map), ending with Interfaces that simply describe one thing that is in common, I.E: Serializable, Comparable, etc..
An Interface does not mean that the objects are related in some way (that is what parent/abstract classes are used for) An Interface simply say that a certain functionality is implemented. (hence, you can inherit multiple interfaces in one class)

Java: Casting one subclass to another?

I'm attempting to understand something I read in a research paper that suggested that you can improve performance with shared objects in Java by making use of the RTTI facilities. The main idea is that you have a class with two empty subclasses to indicate the status of an implicit "bit" in the main class. The reference I'm looking at is in the paper here: http://cs.brown.edu/~mph/HellerHLMSS05/2005-OPODIS-Lazy.pdf in Section 3: Performance.
I am attempting to replicate this technique with a data structure that I'm working on. Basically I have:
class Node {
...
}
class ValidNode extends Node {}
class DeletedNode extends Node {}
I then create an object with:
Node item = new ValidNode();
I want to somehow cast my instance of ValidNode to an instance of DeletedNode. I've tried the following:
Node dnode = DeletedNode.class.cast( node );
and
Node dnode = (DeletedNode)node;
However, both terminate with a ClassCastException. I have to assume that what I am attempting to do is a valid technique, since the author (who in turn integrated this technique into the Java1.6 library) clearly knows what he's doing. However, I don't seem to have enough Java guru-ness to figure out what I'm missing here.
I intend to use something along the lines of
if ( node instanceof DeletedNode ) { // do stuff here
Thank you all in advance.
=============
EDIT:
It looks like the following might work:
class Node {
...
}
class ValidNode extends Node {}
I then create (un-deleted) nodes as of type ValidNode. When I wish to mark a node as deleted, I then cast the node up the chain to type Node. I can then test if a node has been deleted with if (!(node instanceof ValidNode)).
I'll give this a try.
Thing is, all Java knows at compile-time is that you've declared your item as a Node. At runtime, however, the JVM knows the actual class of your object. From that moment on, from the Java perspective, casting is only legal along the inheritance chain. Even that one might fail if you don't pay enough attention (an object built as a Node cannot be cast as a DeletedNode for example). Since your inherited Node types are sibling classes, the cast along the inheritance chain will fail and will throw the well known ClassCastException.
A cursory read of your referenced A Lazy Concurrent List-Based Set Algorithm actually points to an algorithm description in High Performance Dynamic Lock-Free Hash Tables
and List-Based Sets.
Notably, the first paper states:
Achieving the effect of marking a bit in the next pointer is done more efficiently than with AtomicMarkableReference by having two trivial (empty) subclasses of each entry object and using RTTI to determine at runtime which subclass the current instance is, where each subclass represents a state of the mark bit.
Heading over to the AtomicMarkableReference documentation we see that this class stores a reference and an associated boxed boolean.
The second referenced paper shows algorithms using nominal subtypes of Node in atomic compare-and-swap operations. Notably there's no casting going on, just some instance swaps.
I could reason that using an AtomicReference might be faster than an AtomicMarkableReference because there's less stuff to get and set during CAS operations. Using subclasses might actually be faster, but code would look like:
AtomicReference<? extends Node> ref = new AtomicReference<? extends Node>();
Node deletedNode = new DeletedNode();
Node validNode = new ValidNode();
...
ref.compareAndSet(validNode, deletedNode); // or some logic
As noted in the comments, there's no way to cast from one subclass to another, you cannot say an "Apple" is a "Banana" even if both are types of "Fruit". You can, however, carry around instances and swap atomic references.

Declaring a LinkedList in Java

I always learn when we declare a collection we should do, Interface ob = new Class(), if i want to use for example a LinkedList i'll do List ob = new LinkedList(), but then i can't have access to all methods from LinkedList.. Isn't LinkedList ob = new LinkedList() 100% correct?
Isn't LinkedList ob = new LinkedList() 100% correct?
Well I'd suggest using the generic form, but sure - if you want to use functionality which is specific to LinkedList, you need to declare the variable accordingly.
You might want to check whether the Deque<E> or Queue<E> interfaces have what you want though. If they do, use those in-keeping with the idea of describing what you need rather than what implementation you'll use.
Yes,
LinkedList<...> items = new LinkedList<...>();
is perfectly correct if you know that items will depend on methods of LinkedList<T> that are not captured in the List<T> interface.
You should always try to keep the declaration at the highest level possible, meaning that you should stop at the highest level that provides all the functionality that you need: if List methods are not enough, you're perfectly fine with your LinkedList declaration.
If you actually have a need to use methods that are not on the List interface, there is certainly nothing wrong with using LinkedList's API. The general rule of programming to the List interface recognizes that 1) it's pretty rare to need those methods, and 2) in most people's experience, it's way more likely that I discover I need to sort the list and/or use a lot of random access, and decide to switch to an ArrayList, than it is I need one of the methods only LinkedList has.
It may be also that you could be programming to the Queue interface, if you find List isn't giving you what you need.
The rule "always code to interfaces" must be taken with some flexibility. What you are suggesting is fine, and as you came to the conclusion, the only option.
As a side note, coding to concrete classes like this is faster is most JVMs. Deciding whether the performance is worth breaking the rule is the hard thing to decide.
LinkedList is a generic. You should be doing:
LinkedList<String> linkedList = new LinkedList<String>();
(or whatever else you need to store in there instead of String)
Not exactly 100% correct.
A preferred way to declare any collection is to include the data type it's holding. So, for your example, it'd be LinkedList<Integer> ob = new LinkedList<Integer>();.
Nope.. This would be wrong, at the later stages if he wants to change his implementation from linked list to any other implementation of list type he will go wrong... So better to use the interface level declaration.
I won't always suggest you to use generics .....
Coz sometimes you may need to wrap different objects as here....
String str="a string";
boolean status=false;
LinkedList ll = new LinkedList();
ll.add(str);
ll.add(status);
In some situations like case of RMI, u can only send serialized data.....and suppose you want to send a class object(which is unserialized).......There you can wrap the members of the class(primitives) in a LinkedList and pass that object as a whole.......not worrying about the huge number of arguments......
Consider for eg:
public Class DataHouse
{
public int a;
public String str;
.
.
.
}
Now Somewhere u need to pass the objects....
You can do the following....
DataHouse dh =new DataHouse();
LinkedList ll = new LinkedList();
ll.add(dh.a);
ll.add(dh.str);
// Now the content is serialized and can pass it as a capsuled data......
you can still have access to LinkedList methods by using List, all you have to do is to type cast
for example
((LinkedList)ob).add()
The point of using generic List and not LinkedList is because in case you simply change the type of lists you are using (let's say double linked list) your program will still work Generics are to simplify your code to be more portable and more "changeable"
Actually it would be better if it would be parametrized as both are raw types.

Categories