I have a custom DefaultMutableTreeNode class that is designed to support robust connections between many types of data attributes (for me those attributes could be strings, user-defined tags, or timestamps).
As I aggregate data, I'd like to give the user a live preview of the stored data we've seen so far. For efficiency reasons, I'd like to only keep one copy of a particular attribute, that may have many connections to other attributes.
Example: The user-defined tag "LOL" occurs at five different times (represented by TimeStamps). So my JTree (the class that is displaying this information) will have five parent nodes (one for each time that tag occured). Those parent nodes should ALL SHARE ONE INSTANCE of the DefaultMutableTreeNode defined for the "LOL" tag.
Unfortunately, using the add(MutableTreeNode newChild) REMOVES newChild from WHATEVER the current parent node is. That's really too bad, since I want ALL of the parent nodes to have THE SAME child node.
Here is a picture of DOING IT WRONG (Curtis is the author and he should appear FOR ALL THE SHOWS):
How can I accomplish this easily in Java?
Update
I've been looking at the code for DefaultMutableTreeNode.add()... and I'm surprised it works the way it does (comments are mine):
public void add(MutableTreeNode child)
{
if (! allowsChildren)
throw new IllegalStateException();
if (child == null)
throw new IllegalArgumentException();
if (isNodeAncestor(child))
throw new IllegalArgumentException("Cannot add ancestor node.");
// one of these two lines contains the magic that prevents a single "pointer" from being
// a child within MANY DefaultMutableTreeNode Vector<MutableTreeNode> children arrays...
children.add(child); // just adds a pointer to the child to the Vector array?
child.setParent(this); // this just sets the parent for this particular instance
}
If you want easily, you should probably give up on sharing the actual TreeNodes themselves. The whole model is built on the assumption that each node has only one parent. I'd focus instead on designing your custom TreeNode so that multiple nodes can all read their data from the same place, thereby keeping them synced.
I'm not sure it qualifies as easy, but you might look at Creating a Data Model by implementing TreeModel, which "does not require that nodes be represented by DefaultMutableTreeNode objects, or even that nodes implement the TreeNode interface." In addition to the tutorial example, there's a file system example cited here.
Unfortunately, I believe the answer is no. In order to do what you're talking about, you would need to have DefaultMutableTreeNode's internal userObject be a pointer to some String, so that all the corresponding DefaultMutableTreeNode's could point to and share the same String object.
However, you can't call DefaultMutableTreeNode.setUserObject() with any such String pointer, because Java does not have such a concept on the level that C or C++ does. Check out this outstanding blog article on the confusing misconceptions about pass-by-value and pass-by-reference in Java.
Update: Responding to your comment here in the answer space, so I can include a code example. Yes, it's true that Java works with pointers internally... and sometimes you have to clone an object reference to avoid unwanted changes to the original. However, to make a long story short (read the blog article above), this isn't one of those occasions.
public static void main(String[] args) {
// This HashMap is a simplification of your hypothetical collection of values,
// shared by all DefaultMutableTreeNode's
HashMap<String, String> masterObjectCollection = new HashMap<String, String>();
masterObjectCollection.put("testString", "The original string");
// Here's a simplification of some other method elsewhere making changes to
// an object in the master collection
modifyString(masterObjectCollection.get("testString"));
// You're still going to see the original String printed. When you called
// that method, a reference to you object was passed by value... the ultimate
// result being that the original object in you master collection does
// not get changed based on what happens in that other method.
System.out.println(masterObjectCollection.get("testString"));
}
private static void modifyString(String theString) {
theString += "... with its value modified";
}
You might want to check out the JIDE Swing extensions, of which some components are commercial while others are open source and free as in beer. You might find some kind of component that comes closer to accomplishing exactly what you want.
Related
The problem
Consider an implementation of a graph, SampleGraph<N>.
Consider an implementation of the graph nodes, Node extends N, correctly overriding hashCode and equals to mirror logical equality between two nodes.
Now, let's say we want to add some property p to a node. Such a property is bound to logical instances of a node, i.e. for Node n1, n2, n1.equals(n2) implies p(n1) = p(n2)
If I simply add the property as a field of the Node class, this has happened to me:
I define Node n1, n2 such that n1.equals(n2) but n1 != n2
I add n1 and n2 to a graph: n1 when inserting the logical node, and n2 when referencing to the node during insertion of edges. The graph stores both instances.
Later, I retrieve the node from the graph (n1 is returned) and set the property p on it to some value. Later, I traverse all the edges of the graph, and retrieve the node from one of them (n2 is returned). The property p is not set, causing a logical error in my model.
To summarize, current behavior:
graph.addNode(n1) // n1 is added
graph.addEdge(n2,nOther) // graph stores n2
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n2 is returned
The question
All the following statements seem reasonable to me. None of them fully convinces me over the others, so I'm looking for best practice guidelines based on software engineering canons.
S1 - The graph implementation is poor. Upon adding a node, the graph should always internally check if it has an instance of the same node (equals evaluates to true) memorized. If so, such instance should always be the only reference used by the graph.
graph.addNode(n1) // n1 is added
graph.addEdge(n2,nOther) // graph internally checks that n2.equals(n1), doesn't store n2
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n1 is returned
S2 - Assuming the graph behaves as in S1 is a mistake. The programmer should take care that always the same instance of a node is passed to the graph.
graph.addNode(n1) // n1 is added
graph.addEdge(n1,nOther) // the programmer uses n1 every time he refers to the node
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n1 is returned
S3 - The property is not implemented the right way. It should be an information which is external to the class Node. A collection, such as a HashMap<N, Property>, would work just fine, treating different instances as the same object based on hashCode.
HashMap<N, Property> properties;
graph.addNode(n1) // n1 is added
graph.addEdge(n2,nOther) // graph stores n2
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n2 is returned
// get the property. Difference in instances does not matter
properties.get(n1)
properties.get(n2) //same property is returned
S4 - Same as S3, but we could hide the implementation inside Node, this way:
class Node {
private static HashMap<N, Property> properties;
public Property getProperty() {
return properties.get(this);
}
}
Edit: added code snippets for current behavior and tentative solutions following Stephen C's answer. To clarify, the whole example comes from using a real graph data structure from an open source Java project.
It seems that S1 makes the most sense. Some Graph implementations internally use a Set<Node> (or some equivalent) to store the nodes. Of course, using a structure like a Set will ensure that there are no duplicate Nodes, where Node n1 and Node n2 are considered duplicates if and only if n1.equals(n2). Of course, the implementation of Node should ensure that all relevant properties are considered when comparing two instances (ie. when implementing equals() and hashCode()).
Some of the issues with the other statements:
S2, while perhaps reasonable, yields an implementation in which the burden falls to the client to understand and safeguard against a potential pitfall of the internal Graph implementation, which is a clear sign of a poorly designed API for the Graph object.
S3 and S4 both seem weird, although perhaps I don't quite understand the situation. In general, if a Node holds on to some data, it seems perfectly reasonable to define a member variable inside class Node to reflect that. Why should this extra property be treated any differently?
For my mind, it comes down choosing between APIs with strong or weak abstraction.
If you choose strong abstraction, the API would hide the fact that Node objects have identity, and would canonicalize them when they are added to the SimpleGraph.
If you choose weak abstraction, the API would assume that Node objects have identity, and it would be up to the caller to canonicalize them before adding them to the SimpleGraph.
The two approaches lead to different API contracts and require different implementation strategies. The choice is likely to have performance implications ... if that is significant.
Then there are finer details of the API design that may or may not match your specific use-case for the graphs.
The point is that you need to make the choice.
(This a bit is like deciding to use the collections List interface and its clean model, versus implementing your own linked list data structure so that you can efficiently "splice" 2 lists together. Either approach could be correct, depending on the requirements of your application.)
Note that you usually can make a choice, though the choice may be a difficult one. For example, if you are using an API designed by someone else:
You can choose to use it as-is. (Suck it up!)
You can choose to try to influence the design. (Good luck!)
You can choose to switch to a different API; i.e. a different vendor.
You can choose to fork the API and adjust it to your own requirements (or preferences if this is what this is about)
You can choose to design and implement your own API from scratch.
And if you really don't have a choice, then this question is moot. Just use the API.
If this is a open-source API then you probably don't have the choice of getting the designers to change it. Significant API overhauls have a tendency of creating a lot of work for other people; i.e. the many other projects that depend on the API. A responsible API designer / design team takes this into account. Or else they find that they lose relevance because their APIs get a reputation for being unstable.
So ... if you are aiming to influence an existing open-source API design ... 'cos you think they are doing it incorrectly (for some definition of incorrect) ... you are probably better off "forking" the API and dealing with the consequences.
And finally, if you are looking for "best practice" advice, be aware that there are no best practices. And this is not just a philosophical issue. This is about why you will get screwed if you go asking for / looking for "best practice" advice, and then follow it.
As a footnote: have you ever wondered why the Java and Android standard class libraries don't offer any general-purpose graph APIs or implementations? And why they took such a long time to appear in 3rd party libraries (Guava version 20.0)?
The answer is that there is no consensus on what such an API should be like. There are just too many conflicting use-cases and requirement sets.
Let me ask this question taking Java code mentioned in the query link.
In the same query link, a supplementary answer says: "In general, we need to separate List and Node classes, your List should have an head and your Node will have item and next pointer."
In addition, my class lecture says, there are two reasons for separating List and Node class.
Reason 1:
Assume user X and user Y are pointing to the first List_Node in that list
After adding soap item by user X in the shopping list, below is the situation,
So, User Y is inconsistent with this situation.
Reason 2:
Handling empty list.
user X pointing to an empty list, that mean X is null.
X.nth(1); //Null pointer exception
My question:
Reason_1 could have been handled by inserting new node after last node. Reason_2 could have been handled, as part of error check in the code.
So, Why exactly we need to separate Node and List class?
Note: Java code has item of type intinstead of type Object that can accommodate strings. I did not want to change this code again.
Reason_1 could have been handled by inserting new node after last node.
But that is changing the problem. Lists are ordered. Adding an element before an existing element or after an existing element are different operations. The data structure must be able to handle both operations, otherwise it is not a proper list.
Reason_2 could have been handled, as part of error check in the code.
That wouldn't work. The code of your list abstraction can't handle the NPE. If you attempt to call x.nth(1) and x is null, the exception is thrown before you get into any of the code that implements the list. Therefore, the (hypothetical) error handling in the list code cannot be executed. (Java exception handling doesn't work like that ...)
And as you correctly point out in your comment, forcing code that uses a list to handle empty lists as a special case would be bad list API design.
In short, both of the reasons stated are valid. (IMO)
Here are some very good reasons:
Separate implementation from interface. Perhaps in the future someone will find a new perfectly good implementation of your life involving a row of carrier pigeons moving elements around. Should your client code have to update everything with methods flapWings and rewardPigeon instead of manipulating nodes? No! (More realistically, Java offers ArrayList and LinkedList IIRC).
It makes more sense. list.reverse() makes sense. node.reverse()... what? Does this change every other node recursively or not?
Speaking of that reverse method, if you implement it right now you can implement it in O(N). You know if you have an int orientation; that is 1 or -1, you can implement it in O(1)? But all subsequent operations need to use that bit, so it's a meta-node operation, not a node operation.
Empty lists are possible. Empty nodes don't make sense. This is your (2). Imagine this: I am a client of your software. You have list == node. A rival vendor offers separation. You say "oh it works fine just add an error check in your code." Your rival does not. Who do I buy from? This is a thought experiment meant to convince you these really are different, and the former does have a defect.
I'm attempting to understand something I read in a research paper that suggested that you can improve performance with shared objects in Java by making use of the RTTI facilities. The main idea is that you have a class with two empty subclasses to indicate the status of an implicit "bit" in the main class. The reference I'm looking at is in the paper here: http://cs.brown.edu/~mph/HellerHLMSS05/2005-OPODIS-Lazy.pdf in Section 3: Performance.
I am attempting to replicate this technique with a data structure that I'm working on. Basically I have:
class Node {
...
}
class ValidNode extends Node {}
class DeletedNode extends Node {}
I then create an object with:
Node item = new ValidNode();
I want to somehow cast my instance of ValidNode to an instance of DeletedNode. I've tried the following:
Node dnode = DeletedNode.class.cast( node );
and
Node dnode = (DeletedNode)node;
However, both terminate with a ClassCastException. I have to assume that what I am attempting to do is a valid technique, since the author (who in turn integrated this technique into the Java1.6 library) clearly knows what he's doing. However, I don't seem to have enough Java guru-ness to figure out what I'm missing here.
I intend to use something along the lines of
if ( node instanceof DeletedNode ) { // do stuff here
Thank you all in advance.
=============
EDIT:
It looks like the following might work:
class Node {
...
}
class ValidNode extends Node {}
I then create (un-deleted) nodes as of type ValidNode. When I wish to mark a node as deleted, I then cast the node up the chain to type Node. I can then test if a node has been deleted with if (!(node instanceof ValidNode)).
I'll give this a try.
Thing is, all Java knows at compile-time is that you've declared your item as a Node. At runtime, however, the JVM knows the actual class of your object. From that moment on, from the Java perspective, casting is only legal along the inheritance chain. Even that one might fail if you don't pay enough attention (an object built as a Node cannot be cast as a DeletedNode for example). Since your inherited Node types are sibling classes, the cast along the inheritance chain will fail and will throw the well known ClassCastException.
A cursory read of your referenced A Lazy Concurrent List-Based Set Algorithm actually points to an algorithm description in High Performance Dynamic Lock-Free Hash Tables
and List-Based Sets.
Notably, the first paper states:
Achieving the effect of marking a bit in the next pointer is done more efficiently than with AtomicMarkableReference by having two trivial (empty) subclasses of each entry object and using RTTI to determine at runtime which subclass the current instance is, where each subclass represents a state of the mark bit.
Heading over to the AtomicMarkableReference documentation we see that this class stores a reference and an associated boxed boolean.
The second referenced paper shows algorithms using nominal subtypes of Node in atomic compare-and-swap operations. Notably there's no casting going on, just some instance swaps.
I could reason that using an AtomicReference might be faster than an AtomicMarkableReference because there's less stuff to get and set during CAS operations. Using subclasses might actually be faster, but code would look like:
AtomicReference<? extends Node> ref = new AtomicReference<? extends Node>();
Node deletedNode = new DeletedNode();
Node validNode = new ValidNode();
...
ref.compareAndSet(validNode, deletedNode); // or some logic
As noted in the comments, there's no way to cast from one subclass to another, you cannot say an "Apple" is a "Banana" even if both are types of "Fruit". You can, however, carry around instances and swap atomic references.
In a Java application I need an structure to store, lets call them nodes, and the number of relationship with other nodes. For example, I would need to know that node A is related with B 3 times.
Thinking in a way to implement this I got to this possible solution: Have a hashmap using nodes as key and another hashmap as value. This hashmap would store nodes as key (node B in the example) and an integer as value representing the number of relationships.
What do you think about this? Is it a good approach?
If so, I have a question. Suppose tha I store strings and they come from a text file after apply String.split function. Now I store "hello" in the first hashmap but after processing the file, this string appears as a destiny node in the second hashmap. Would these strings have a reference to the same object or I'll have multiple copies of the same objects?
Regarding the first question, I would do something similar but different. Instead of creating a Hashmap inside a Hashmap I would create a new class Relationship that looks something like this:
public class NodeRelationship {
private Node relatedNode;
private int numOfRelations
// Constructor + getters and setters
}
And define your map like this: Map<Node, List<NodeRelationship>> This seems more readable to me (but maybe it's just me) and easier to later expend. For example if you iterate on the list and want to know the original node you can add a member parent to NodeRelationshio and so on.
Regarding the second question - it depends on how you create your objects and whether you create new objects or use existing ones. If you have a node hello that you put in your value Hashmap (or in the List in my solution) and you use the same object for creating a new key - so there's no duplication. If you don't have a way (or just don't search for) to know that the node was already created, and you create new node - then you'll have duplications of objects.
If indeed your each of your nodes is created from text string, you can maintain a new Map<String, Node> and in the process of reading the file you can maintain this map and check if an object exists before creating a new one. This is very low cost in performance and you can get rid of the map once the construction of the objects from text is done.
I have two classes:
Class Node {
int address
}
Class Link{
int latency;
int bandwidth;
Node node1;
Node node2;
}
public Link [] link= new Link[Nmax];
if I want to create a link between two nodes, it is easy, I've just to:
node1=new Node(); //and then I add parameter like address and so on
node2= new Node();//...............
link[1]= new Link();
link[1].node1=node1;
link[1].node2=node2;
link[1].latency=15; //and so on, we suppose that we have 100 nodes and 60 links
Now, during the program, sometimes we add some new nodes then we have to add links between them, I can do this with the same manner us above, my question is:
what I have to do if I want to delete a node ? (links between this node and other existing nodes must be deleted too)
--- Edited in response to jpm's excellent observation ---
In your case, you are doing all of the data structure management yourself, but are not storing enough infomation to undo the additions to the data structure.
You need to store enough information at create time to support the other operations on the data structure. This means that perhaps the choice of an array as your high-level exposed data structure is a bad choice, because there is no guarantee that additions will maintain the sufficient information to support the removals.
Wrap the array in an object, and write the code in the add(...) method to support the efficient removal. This probably means storing more information, which was constructed specifically for the support of removal.
--- Original post follows ---
To delete an object in Java, ensure that nothing "points" to it (has a reference to it) and then wait. Automatic garbage collection will do the deleting for you.
An example to make this clear
link[1] = new Link();
// now we have an extra Link in our array
link[1] = null;
// now garbage collection will delete the recently added Link object.
Note that if you have two or three references to the created Link object, it will not be collected until all the references are lost
link[1] = new Link();
// now we have an extra Link in our array
Link current = link[1];
// now we have two references to the newly created Link
link[1] = null;
// now we have one reference to the newly created Link
current = null;
// now the newly created Link is a candidate for garbage collection
The way this is implemented is there is a top-level Thread of the user implemented program. If that thread can reach the Object in question then it won't get garbage collected. This means that rings and meshes of Objects that are no longer reachable from the live Threads will be collected in mass.
before deleting a node; loop over your links and remove any that have your node to be deleted as node1 or node2, then delete your node.
You probably want to explore using a better data structure than an array for this use case. You want to be able to inspect all Links, figure out which ones refer to the deleted Node, and remove those Links. Start looking at List/Set and see if they suit your needs, and slowly evolve into a good implementation that gives you what you need.
"Deleting" an object in Java means to remove all references that point to the object. The Garbage Collector then eventually will free the memory occupied by the object. So what you need to do, would be to set all references to a specific Node object to null.
// lets delete node1
link[1].node1 = null;
node1 = null;
// at some point the object will be deleted