I'm attempting to understand something I read in a research paper that suggested that you can improve performance with shared objects in Java by making use of the RTTI facilities. The main idea is that you have a class with two empty subclasses to indicate the status of an implicit "bit" in the main class. The reference I'm looking at is in the paper here: http://cs.brown.edu/~mph/HellerHLMSS05/2005-OPODIS-Lazy.pdf in Section 3: Performance.
I am attempting to replicate this technique with a data structure that I'm working on. Basically I have:
class Node {
...
}
class ValidNode extends Node {}
class DeletedNode extends Node {}
I then create an object with:
Node item = new ValidNode();
I want to somehow cast my instance of ValidNode to an instance of DeletedNode. I've tried the following:
Node dnode = DeletedNode.class.cast( node );
and
Node dnode = (DeletedNode)node;
However, both terminate with a ClassCastException. I have to assume that what I am attempting to do is a valid technique, since the author (who in turn integrated this technique into the Java1.6 library) clearly knows what he's doing. However, I don't seem to have enough Java guru-ness to figure out what I'm missing here.
I intend to use something along the lines of
if ( node instanceof DeletedNode ) { // do stuff here
Thank you all in advance.
=============
EDIT:
It looks like the following might work:
class Node {
...
}
class ValidNode extends Node {}
I then create (un-deleted) nodes as of type ValidNode. When I wish to mark a node as deleted, I then cast the node up the chain to type Node. I can then test if a node has been deleted with if (!(node instanceof ValidNode)).
I'll give this a try.
Thing is, all Java knows at compile-time is that you've declared your item as a Node. At runtime, however, the JVM knows the actual class of your object. From that moment on, from the Java perspective, casting is only legal along the inheritance chain. Even that one might fail if you don't pay enough attention (an object built as a Node cannot be cast as a DeletedNode for example). Since your inherited Node types are sibling classes, the cast along the inheritance chain will fail and will throw the well known ClassCastException.
A cursory read of your referenced A Lazy Concurrent List-Based Set Algorithm actually points to an algorithm description in High Performance Dynamic Lock-Free Hash Tables
and List-Based Sets.
Notably, the first paper states:
Achieving the effect of marking a bit in the next pointer is done more efficiently than with AtomicMarkableReference by having two trivial (empty) subclasses of each entry object and using RTTI to determine at runtime which subclass the current instance is, where each subclass represents a state of the mark bit.
Heading over to the AtomicMarkableReference documentation we see that this class stores a reference and an associated boxed boolean.
The second referenced paper shows algorithms using nominal subtypes of Node in atomic compare-and-swap operations. Notably there's no casting going on, just some instance swaps.
I could reason that using an AtomicReference might be faster than an AtomicMarkableReference because there's less stuff to get and set during CAS operations. Using subclasses might actually be faster, but code would look like:
AtomicReference<? extends Node> ref = new AtomicReference<? extends Node>();
Node deletedNode = new DeletedNode();
Node validNode = new ValidNode();
...
ref.compareAndSet(validNode, deletedNode); // or some logic
As noted in the comments, there's no way to cast from one subclass to another, you cannot say an "Apple" is a "Banana" even if both are types of "Fruit". You can, however, carry around instances and swap atomic references.
Related
The problem
Consider an implementation of a graph, SampleGraph<N>.
Consider an implementation of the graph nodes, Node extends N, correctly overriding hashCode and equals to mirror logical equality between two nodes.
Now, let's say we want to add some property p to a node. Such a property is bound to logical instances of a node, i.e. for Node n1, n2, n1.equals(n2) implies p(n1) = p(n2)
If I simply add the property as a field of the Node class, this has happened to me:
I define Node n1, n2 such that n1.equals(n2) but n1 != n2
I add n1 and n2 to a graph: n1 when inserting the logical node, and n2 when referencing to the node during insertion of edges. The graph stores both instances.
Later, I retrieve the node from the graph (n1 is returned) and set the property p on it to some value. Later, I traverse all the edges of the graph, and retrieve the node from one of them (n2 is returned). The property p is not set, causing a logical error in my model.
To summarize, current behavior:
graph.addNode(n1) // n1 is added
graph.addEdge(n2,nOther) // graph stores n2
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n2 is returned
The question
All the following statements seem reasonable to me. None of them fully convinces me over the others, so I'm looking for best practice guidelines based on software engineering canons.
S1 - The graph implementation is poor. Upon adding a node, the graph should always internally check if it has an instance of the same node (equals evaluates to true) memorized. If so, such instance should always be the only reference used by the graph.
graph.addNode(n1) // n1 is added
graph.addEdge(n2,nOther) // graph internally checks that n2.equals(n1), doesn't store n2
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n1 is returned
S2 - Assuming the graph behaves as in S1 is a mistake. The programmer should take care that always the same instance of a node is passed to the graph.
graph.addNode(n1) // n1 is added
graph.addEdge(n1,nOther) // the programmer uses n1 every time he refers to the node
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n1 is returned
S3 - The property is not implemented the right way. It should be an information which is external to the class Node. A collection, such as a HashMap<N, Property>, would work just fine, treating different instances as the same object based on hashCode.
HashMap<N, Property> properties;
graph.addNode(n1) // n1 is added
graph.addEdge(n2,nOther) // graph stores n2
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n2 is returned
// get the property. Difference in instances does not matter
properties.get(n1)
properties.get(n2) //same property is returned
S4 - Same as S3, but we could hide the implementation inside Node, this way:
class Node {
private static HashMap<N, Property> properties;
public Property getProperty() {
return properties.get(this);
}
}
Edit: added code snippets for current behavior and tentative solutions following Stephen C's answer. To clarify, the whole example comes from using a real graph data structure from an open source Java project.
It seems that S1 makes the most sense. Some Graph implementations internally use a Set<Node> (or some equivalent) to store the nodes. Of course, using a structure like a Set will ensure that there are no duplicate Nodes, where Node n1 and Node n2 are considered duplicates if and only if n1.equals(n2). Of course, the implementation of Node should ensure that all relevant properties are considered when comparing two instances (ie. when implementing equals() and hashCode()).
Some of the issues with the other statements:
S2, while perhaps reasonable, yields an implementation in which the burden falls to the client to understand and safeguard against a potential pitfall of the internal Graph implementation, which is a clear sign of a poorly designed API for the Graph object.
S3 and S4 both seem weird, although perhaps I don't quite understand the situation. In general, if a Node holds on to some data, it seems perfectly reasonable to define a member variable inside class Node to reflect that. Why should this extra property be treated any differently?
For my mind, it comes down choosing between APIs with strong or weak abstraction.
If you choose strong abstraction, the API would hide the fact that Node objects have identity, and would canonicalize them when they are added to the SimpleGraph.
If you choose weak abstraction, the API would assume that Node objects have identity, and it would be up to the caller to canonicalize them before adding them to the SimpleGraph.
The two approaches lead to different API contracts and require different implementation strategies. The choice is likely to have performance implications ... if that is significant.
Then there are finer details of the API design that may or may not match your specific use-case for the graphs.
The point is that you need to make the choice.
(This a bit is like deciding to use the collections List interface and its clean model, versus implementing your own linked list data structure so that you can efficiently "splice" 2 lists together. Either approach could be correct, depending on the requirements of your application.)
Note that you usually can make a choice, though the choice may be a difficult one. For example, if you are using an API designed by someone else:
You can choose to use it as-is. (Suck it up!)
You can choose to try to influence the design. (Good luck!)
You can choose to switch to a different API; i.e. a different vendor.
You can choose to fork the API and adjust it to your own requirements (or preferences if this is what this is about)
You can choose to design and implement your own API from scratch.
And if you really don't have a choice, then this question is moot. Just use the API.
If this is a open-source API then you probably don't have the choice of getting the designers to change it. Significant API overhauls have a tendency of creating a lot of work for other people; i.e. the many other projects that depend on the API. A responsible API designer / design team takes this into account. Or else they find that they lose relevance because their APIs get a reputation for being unstable.
So ... if you are aiming to influence an existing open-source API design ... 'cos you think they are doing it incorrectly (for some definition of incorrect) ... you are probably better off "forking" the API and dealing with the consequences.
And finally, if you are looking for "best practice" advice, be aware that there are no best practices. And this is not just a philosophical issue. This is about why you will get screwed if you go asking for / looking for "best practice" advice, and then follow it.
As a footnote: have you ever wondered why the Java and Android standard class libraries don't offer any general-purpose graph APIs or implementations? And why they took such a long time to appear in 3rd party libraries (Guava version 20.0)?
The answer is that there is no consensus on what such an API should be like. There are just too many conflicting use-cases and requirement sets.
I have seen some people using the topmost parent class as a variable type to hold the child instance and some people use just parent class only. For example:
Collection obj = new ArrayList();
Or
List obj = new ArrayList();
Here, List comes under the Collection only then can’t we use above first line instead of second?
Again, we can't use everywhere in collection framework the reference variable of Collection class only to hold any instance of the class under Collection?
Is this a good practice?
So, I wanted to know which comes under the best practices and why?
If someone could justify technically like performance concerns etc. would be greatly appreciated.
It really really depends on your needs. In your example it doesn't really changes much for basic needs but if you inspect the two interfaces there are some changes. Look :
https://docs.oracle.com/javase/7/docs/api/java/util/Collection.html
and
https://docs.oracle.com/javase/7/docs/api/java/util/List.html
We can notice that the List gives you access to methods Collection doesn't.
set(int index, E element) for instance is defined in the List interface and not in Collection.
This is because every classes inheriting from Collection don't need to implement all the same methods.
Performance wise it have no impact.
Always use the top-most parent class that have all the functionalities you need. For your example there is no need to go higher than List .
There is no so called "best practice" for choosing the class to be used for the reference type. In fact, the class in the highest hierarchy is the Object class. Do you use Object as the reference type for everything you do? No, but generally you may choose the higher class with the all the methods available for your needs.
Instead of following the so called "best practice", apply what suits best for your situation.
These are some pros and cons for using higher hierarchy classes as reference type:
Advantage
Allows grouping of object which shares the same ancestor (super class)
Allows all instances of the given class to be assigned to it
Animal dog = new Dog();
Animal cat = new Cat();
Allows polymorphism
dog.makeNoise();
cat.makeNoise();
It is only an advantage when you are accessing common behaviours or members.
Disadvantage
Requires casting when you are accessing behaviours which exist in one object but not the other.
dog.swim(); //error, class Animal do not have swim()
((Dog)dog).swim();
As you start dumping various objects in the common parent class, you may have a hard time trying to figure out which members belongs to which class.
(Cat(cat)).swim(); //error, class Cat do not have swim()
The general idea is hiding as much as you can so things are easier to change. If you need indexing for instance (List.get(int index) then it MUST be a list because a collection does not support .get(index). If you don't need indexing, then hiding the fact you're using a list, means you can switch to other collections that might not be a list later without any trouble.
For example, maybe one month later I want to use a set instead of list. But Set doesn't support .get(index). So anybody who uses this List might use the indexing features of a list and it would make it difficult to switch to a set because every where someone else used .get(), would break.
On the other hand, excessively hiding your types can cause accidental performance issues because a consumer of your method didn't know the type. Suppose you return a List that's actually a linkedlist (where indexing is O(n)). Suppose the consumer of this list does a lookup for each entry in another list. That can be O(n*m) performance which is really slow. If you advertised that it was a linked list in the first place, the consumer of the linkedlist would realize that it's probably not a good idea to make multiple indexes into this list and the consumer can make a local copy.
Library code (suppose the one you're designing)
public class Util {
public static List<String> makeData() {
return new LinkedList(Arrays.asList("dogs", "cats", "zebras", "deer"));
}
}
Caller's code (suppose the one that's using your library or method)
public static void main(String [] args) {
List<String> data = Util.makeData();
int [] indicesToLookUp = {1,4,2,3,0};
for( int idx : indicesToLookUp ) {
if(idx < data.size()) {
// each index (LinkedList.get()) is slow O(N)
doSomethingWithEntry(idx, list.get(idx));
}
}
}
You could argue it's the caller's fault because he incorrectly assumed the List is an ArrayList<> and should have made a local copy of the list.
The question is framed for List but easily applies to others in the java collections framework.
For example, I would say it is certainly appropriate to make a new List sub-type to store something like a counter of additions since it is an integral part of the list's operation and doesn't alter that it "is a list". Something like this:
public class ChangeTrackingList<E> extends ArrayList<E> {
private int changeCount;
...
#Override public boolean add(E e) {
changeCount++;
return super.add(e);
}
// other methods likewise overridden as appropriate to track change counter
}
However, what about adding additional functionality out of the knowledge of a list ADT, such as storing arbitrary data associated with a list element? Assuming the associated data was properly managed when elements are added and removed, of course. Something like this:
public class PayloadList<E> extends ArrayList<E> {
private Object[] payload;
...
public void setData(int index, Object data) {
... // manage 'payload' array
payload[index] = data;
}
public Object getData(int index) {
... // manage 'payload' array, error handling, etc.
return payload[index];
}
}
In this case I have altered that it is "just a list" by adding not only additional functionality (behavior) but also additional state. Certainly part of the purpose of type specification and inheritance, but is there an implicit restriction (taboo, deprecation, poor practice, etc.) on Java collections types to treat them specially?
For example, when referencing this PayloadList as a java.util.List, one will mutate and iterate as normal and ignore the payload. This is problematic when it comes to something like persistence or copying which does not expect a List to carry additional data to be maintained. I've seen many places that accept an object, check to see that it "is a list" and then simply treat it as java.util.List. Should they instead allow arbitrary application contributions to manage specific concrete sub-types?
Since this implementation would constantly produce issues in instance slicing (ignoring sub-type fields) is it a bad idea to extend a collection in this way and always use composition when there is additional data to be managed? Or is it instead the job of persistence or copying to account for all concrete sub-types including additional fields?
This is purely a matter of opinion, but personally I would advise against extending classes like ArrayList in almost all circumstances, and favour composition instead.
Even your ChangeTrackingList is rather dodgy. What does
list.addAll(Arrays.asList("foo", "bar"));
do? Does it increment changeCount twice, or not at all? It depends on whether ArrayList.addAll() uses add(), which is an implementation detail you should not have to worry about. You would also have to keep your methods in sync with the ArrayList methods. For example, at present addAll(Collection<?> collection) is implemented on top of add(), but if they decided in a future release to check first if collection instanceof ArrayList, and if so use System.arraycopy to directly copy the data, you would have to change your addAll() method to only increment changeCount by collection.size() if the collection is an ArrayList (otherwise it gets done in add()).
Also if a method is ever added to List (this happened with forEach() and stream() for example) this would cause problems if you were using that method name to mean something else. This can happen when extending abstract classes too, but at least an abstract class has no state, so you are less likely to be able to cause too much damage by overriding methods.
I would still use the List interface, and ideally extend AbstractList. Something like this
public final class PayloadList<E> extends AbstractList<E> implements RandomAccess {
private final ArrayList<E> list;
private final Object[] payload;
// details missing
}
That way you have a class that implements List and makes use of ArrayList without you having to worry about implementation details.
(By the way, in my opinion, the class AbstractList is amazing. You only have to override get() and size() to have a functioning List implementation and methods like containsAll(), toString() and stream() all just work.)
One aspect you should consider is that all classes that inherit from AbstractList are value classes. That means that they have meaningful equals(Object) and hashCode() methods, therefore two lists are judged to be equal even if they are not the same instance of any class.
Furthermore, the equals() contract from AbstractList allows any list to be compared with the current list - not just a list with the same implementation.
Now, if you add a value item to a value class when you extend it, you need to include that value item in the equals() and hashCode() methods. Otherwise you will allow two PayloadList lists with different payloads to be considered "the same" when somebody uses them in a map, a set, or just a plain equals() comparison in any algorithm.
But in fact, it's impossible to extend a value class by adding a value item! You'll end up breaking the equals() contract, either by breaking symmetry (A plain ArrayList containing [1,2,3] will return true when compared with a PayloadList containing [1,2,3] with a payload of [a,b,c], but the reverse comparison won't return true). Or you'll break transitivity.
This means that basically, the only proper way to extend a value class is by adding non-value behavior (e.g. a list that logs every add() and remove()). The only way to avoid breaking the contract is to use composition. And it has to be composition that does not implement List at all (because again, other lists will accept anything that implements List and gives the same values when iterating it).
This answer is based on item 8 of Effective Java, 2nd Edition, by Joshua Bloch
If the class is not final, you can always extend it. Everything else is subjective and a matter of opinion.
My opinion is to favor composition over inheritance, since in the long run, inheritance produces low cohesion and high coupling, which is the opposite of a good OO design. But this is just my opinion.
The following is all just opinion, the question invites opinionated answers (I think its borderline to not being approiate for SO).
While your approach is workable in some situations, I'd argue its bad design and it is very brittle. Its also pretty complicated to cover all loopholes (maybe even impossible, for example when the list is sorted by means other than List.sort).
If you require extra data to be stored and managed it would be better integrated into the list items themselves, or the data could be associated using existing collection types like Map.
If you really need an association list type, consider making it not an instance of java.util.List, but a specialized type with specialized API. That way no accidents are possible by passing it where a plain list is expected.
I haven't written any Java code in more than 10 years. I'm enjoying it, but I don't think I get some of the details of polymorphic programming. I have an abstract Node class, which has tag and data subclasses (among others), and I store them in an ArrayList.
But when I get them out of the ArrayList via an Iterator, I get Node objects back. I'm not sure how best to deal with the objects I get back from the iterator.
Here's an example:
// initialize the list
TagNode tag = new TagNode();
ArrayList<Node> list = new ArrayList<>();
list.add(tag);
// And many more go into the list, some TagNodes, some DataNodes, etc.
and later I use an iterator to process them:
Iterator<Node> i = list.iterator();
Node n = i.next();
// How do I tell if n is a TagNode or a DataNode?
I know that I can cast to one of the Node subclasses, but how do I know which subclass to use? Do I need to embed type information inside the Node classes?
You should not need to know which child class is which, in most circumstances.
That is precisely the advantage with polymorphism.
If your hierarchical design is solid, the Node will have all the behaviors (== methods) needed to perform operations on your List items without worrying about which child class they are an instance of: overridden methods resolve at runtime.
In some cases, you might want to use the instanceof operator to actually check which child class your Node belongs to, but I would consider it a rare case, best to be avoided in general principles.
Ideally you don't want to treat them differently, but if you wanted to determine the type, you can check using instanceof:
Node n = i.next();
if (n instanceof Tag) {
// behavior
}
As others said, checking explicitly which subclass your object belongs to should not be necessary and is also a bad style. But if you really need it, you can use instanceof operator.
The polymorphic behavior would mean that you don't really care if you know what type of Node is it. You just simply call the API and the behavior will as per the implementation of concrete type.
But if you really need to know, one of the ways is to use instanceof to know which is the exact type. For e.g.
if( i instanceof tag )
// handle tag
else if( i instanceof data)
//handle data
I have a custom DefaultMutableTreeNode class that is designed to support robust connections between many types of data attributes (for me those attributes could be strings, user-defined tags, or timestamps).
As I aggregate data, I'd like to give the user a live preview of the stored data we've seen so far. For efficiency reasons, I'd like to only keep one copy of a particular attribute, that may have many connections to other attributes.
Example: The user-defined tag "LOL" occurs at five different times (represented by TimeStamps). So my JTree (the class that is displaying this information) will have five parent nodes (one for each time that tag occured). Those parent nodes should ALL SHARE ONE INSTANCE of the DefaultMutableTreeNode defined for the "LOL" tag.
Unfortunately, using the add(MutableTreeNode newChild) REMOVES newChild from WHATEVER the current parent node is. That's really too bad, since I want ALL of the parent nodes to have THE SAME child node.
Here is a picture of DOING IT WRONG (Curtis is the author and he should appear FOR ALL THE SHOWS):
How can I accomplish this easily in Java?
Update
I've been looking at the code for DefaultMutableTreeNode.add()... and I'm surprised it works the way it does (comments are mine):
public void add(MutableTreeNode child)
{
if (! allowsChildren)
throw new IllegalStateException();
if (child == null)
throw new IllegalArgumentException();
if (isNodeAncestor(child))
throw new IllegalArgumentException("Cannot add ancestor node.");
// one of these two lines contains the magic that prevents a single "pointer" from being
// a child within MANY DefaultMutableTreeNode Vector<MutableTreeNode> children arrays...
children.add(child); // just adds a pointer to the child to the Vector array?
child.setParent(this); // this just sets the parent for this particular instance
}
If you want easily, you should probably give up on sharing the actual TreeNodes themselves. The whole model is built on the assumption that each node has only one parent. I'd focus instead on designing your custom TreeNode so that multiple nodes can all read their data from the same place, thereby keeping them synced.
I'm not sure it qualifies as easy, but you might look at Creating a Data Model by implementing TreeModel, which "does not require that nodes be represented by DefaultMutableTreeNode objects, or even that nodes implement the TreeNode interface." In addition to the tutorial example, there's a file system example cited here.
Unfortunately, I believe the answer is no. In order to do what you're talking about, you would need to have DefaultMutableTreeNode's internal userObject be a pointer to some String, so that all the corresponding DefaultMutableTreeNode's could point to and share the same String object.
However, you can't call DefaultMutableTreeNode.setUserObject() with any such String pointer, because Java does not have such a concept on the level that C or C++ does. Check out this outstanding blog article on the confusing misconceptions about pass-by-value and pass-by-reference in Java.
Update: Responding to your comment here in the answer space, so I can include a code example. Yes, it's true that Java works with pointers internally... and sometimes you have to clone an object reference to avoid unwanted changes to the original. However, to make a long story short (read the blog article above), this isn't one of those occasions.
public static void main(String[] args) {
// This HashMap is a simplification of your hypothetical collection of values,
// shared by all DefaultMutableTreeNode's
HashMap<String, String> masterObjectCollection = new HashMap<String, String>();
masterObjectCollection.put("testString", "The original string");
// Here's a simplification of some other method elsewhere making changes to
// an object in the master collection
modifyString(masterObjectCollection.get("testString"));
// You're still going to see the original String printed. When you called
// that method, a reference to you object was passed by value... the ultimate
// result being that the original object in you master collection does
// not get changed based on what happens in that other method.
System.out.println(masterObjectCollection.get("testString"));
}
private static void modifyString(String theString) {
theString += "... with its value modified";
}
You might want to check out the JIDE Swing extensions, of which some components are commercial while others are open source and free as in beer. You might find some kind of component that comes closer to accomplishing exactly what you want.