Test depth-first tree

Test depth-first tree - java

I made a Java program to browse a tree with depth-first. The program is correct, but the choice of the son of a node is random. for example in this tree :
sometimes, the result is:
A-B-E-C-F-D
A-C-F-D-B-E
A-B-E-D-C-F
I want to make test (unit testing) of this program but i I have no idea, how i can do it? please
I thought to do a List that contains the elements and compare the elements of the list with the result of my depth-first tree, but the result of my depth-first is randomly. Then I can not compare it with the elements of the List.

There are 2 properties you want to test for:
Each node visited exactly one
Traversal is depth first
The first is easy to test: the number of unique nodes visited must be equal to the number of nodes in the tree. Can test that against any random tree.
The second is slightly trickier - expressing it in the general case is probably more complex than the tested code. Easier just to pick some representative constraints based on the specific known data, i.e.
B must be after A
E must be immediately after B
...
Hard to conceive of realistic code that satisfies the first property for all trees, but would fail the second only in specific cases. So outside of the most formal of safety critical systems (and what are they doing using dynamic data structures anyway?), that's going to be enough.

I haven't clicked on your link, but if the code is truly random and is intended to be, then you should make your unit test so that it says "given this input, then the output must be one of these three things". This isn't ideal because it might take many, many runs before a bug shows up (i.e. the first few times you run it, it might just randomly mask the bug), but I suspect it's the best you can do for testing an algorithm with random behaviour.

This means that the order of the children of each node is not deterministic. You probably used a Set to hold children. Consider using a LinkedHashSet (which preserves insertion order) or a SortedSet (which sorts children). This way, the order will always be the same.
If randomness is a feature of your tree and you want to keep it as is, then see the other answers, or change the algorithm itself to make sure you always sort the children while traversing the tree.

Choose a data set for the unit test that has just a few valid results (it should however have more than one, obviously), and test whether the result is one of them.
Alternatively, you could try to impose a well-defined order on the nodes (e.g. by alphabetically sorting the children of each node, instead of managing them in a Set)

Related

Why do i separate List and Node class?

Let me ask this question taking Java code mentioned in the query link.
In the same query link, a supplementary answer says: "In general, we need to separate List and Node classes, your List should have an head and your Node will have item and next pointer."
In addition, my class lecture says, there are two reasons for separating List and Node class.
Reason 1:
Assume user X and user Y are pointing to the first List_Node in that list
After adding soap item by user X in the shopping list, below is the situation,
So, User Y is inconsistent with this situation.
Reason 2:
Handling empty list.
user X pointing to an empty list, that mean X is null.
X.nth(1); //Null pointer exception
My question:
Reason_1 could have been handled by inserting new node after last node. Reason_2 could have been handled, as part of error check in the code.
So, Why exactly we need to separate Node and List class?
Note: Java code has item of type intinstead of type Object that can accommodate strings. I did not want to change this code again.

Reason_1 could have been handled by inserting new node after last node.
But that is changing the problem. Lists are ordered. Adding an element before an existing element or after an existing element are different operations. The data structure must be able to handle both operations, otherwise it is not a proper list.
Reason_2 could have been handled, as part of error check in the code.
That wouldn't work. The code of your list abstraction can't handle the NPE. If you attempt to call x.nth(1) and x is null, the exception is thrown before you get into any of the code that implements the list. Therefore, the (hypothetical) error handling in the list code cannot be executed. (Java exception handling doesn't work like that ...)
And as you correctly point out in your comment, forcing code that uses a list to handle empty lists as a special case would be bad list API design.
In short, both of the reasons stated are valid. (IMO)

Here are some very good reasons:
Separate implementation from interface. Perhaps in the future someone will find a new perfectly good implementation of your life involving a row of carrier pigeons moving elements around. Should your client code have to update everything with methods flapWings and rewardPigeon instead of manipulating nodes? No! (More realistically, Java offers ArrayList and LinkedList IIRC).
It makes more sense. list.reverse() makes sense. node.reverse()... what? Does this change every other node recursively or not?
Speaking of that reverse method, if you implement it right now you can implement it in O(N). You know if you have an int orientation; that is 1 or -1, you can implement it in O(1)? But all subsequent operations need to use that bit, so it's a meta-node operation, not a node operation.
Empty lists are possible. Empty nodes don't make sense. This is your (2). Imagine this: I am a client of your software. You have list == node. A rival vendor offers separation. You say "oh it works fine just add an error check in your code." Your rival does not. Who do I buy from? This is a thought experiment meant to convince you these really are different, and the former does have a defect.

Java LinkedHashSet remove some elements from the end

I am working on a problem where i'm required to store elements with requirements of No Duplication and Maintaining order. I chose to go with LinkedHashSet Since it fulfilled both my requirements.
Let's say I have this code:
LinkedHashSet hs = new LinkedHashSet();
hs.add("B");
hs.add("A");
hs.add("D");
hs.add("E");
hs.add("C");
hs.add("F");
if(hs.contains("D")){
//do something to remove elements added after"D" i-e remove "E", "C" and "F"
//maybe hs.removeAll(Collection<?>c) ??
}
Can anyone please guide me with the logic to remove these elements?
Am I using the wrong datastructure? If so, then what would be a better alternative?

I think you may need to use an iterator to do the removal if you are using a LinkedHashSet. That is to say find the element, then keep removing until you get to the tail. This will be O(n), but even if you wrote your own LinkedHashSet (with a doubly linked list and hashset) you would have access to the raw linking structure so that you could cut the linked list in O(1), but you would still need to remove all elements that you just cut from the linked list from the HashSet which is where the O(n) cost would arise again.
So in summary, remove the element, then keep an iterator to that element and continue to walk down removing elements until you get to the end. I'm not sure if LinkedHashSet exposes the required calls, but you can probably figure that out.

You could write your own version of an ArrayList that doesn't allow for duplicates, by overriding add() and addAll(). To my knowledge, there is no "common" 3rd party version of such, which has always surprised me. Anybody know of one?
Then the remove code is pretty simple (no need to use an ListIterator)
int idx = this.indexOf("D");
if (idx >= 0) {
for (int goInReverse = this.size()-1; goInReverse > idx; goInReverse--)
this.remove(goInReverse);
}
However, this is still O(N), cause you loop through every element of the List.

The basic problem here is that you have to maintain two data structures, a "map" one representing the key / value mapping, and a "list" other representing the insertion order.
There are "map" and "list" organizations that offer fast removal of a elements after a given point; e.g. ordered trees of various kinds and both array and chain-based lists (modulo the cost of locating the point.)
However, it seems impossible to remove N elements from the two data structures in better than O(N). You have to visit all of the elements being removed to remove them from the 2nd data structure. (In fact, I suspect one could prove this mathematically ...)
In short, there is no data structure that has better complexity than what you are currently using.
The area where it is possible to improve performance (with a custom collection class!) is in avoiding an explicit use of an iterator. Using an iterator and the standard iterator API, the cost is O(N) on the total number of elements in the data structure. You could make this O(N) on the number of elements removed ... if the hash entry nodes also had next/prev links for the sequence.

So, after trying a couple of things mentioned above, I chose to implement a different Data structure. Since I did not have any issue with the O(n) for this problem (as my data is very small)
I used Graphs, this library came in really handy: http://jgrapht.org/
What I am doing is adding all elements as vertices to a DirectedGraph also creating edges between them (edges helped me solve another non-related problem as well). And when it's time to remove the elements I use a recursive function with the following pseudo code:
removeElements(element) {
tempEdge = graph.getOutgoingEdgeFrom(element)
if(tempEdge !=null)
return;
tempVertex = graph.getTargetVertex(tempEdge)
removeElements(tempVertex)
graph.remove(tempVertex)
}
I agree that graph DS is not good for these kind of problems, but under my conditions, this works perfectly... Cheers!

Ever worth using an array of length 2 instead of two variables for syntactical reasons?

So I was implementing my own binary search tree and noticed that an ugly if statement appears all too often the way I'm doing it (which possibly isn't the best way, but that's not what we're discussing), based on whether a child of a node is the left or right child, e.g:
if (leftChild)
parent.setLeft(child.getRight());
else
parent.setRight(child.getRight());
Then I thought of this:
parent.setChild(childIndex, child.getRight());
Where childIndex is a byte that was determined earlier where leftChild would have been determined.
As you can see this much more concise, but to have it this way I would either have to have an if statement in the setChild method or represent the children as an array of length 2. If we pretend here that this BST requires maximized performance/space efficiency, what kind of trade-off would it be to switch the storage of child node references to a 2-element array, rather than a pair of variables (or even just hide the if statement inside the setChild method).
I know in a real-world situation this might not matter that much, but I'm still interested in which would be the best approach.

As I understand it, you're asking, which of the following two is more efficient:
if (condition)
x = value
else
y = value
and
xyArray[index] = value
First of all any answer is compiler and/or JVM implementation dependent since the JLS doesn't mention any execution times for any types of statements.
That being said, I would suspect that the if-statement would be slightly faster since the JVM doesn't have to compute any array offset and check array bounds.
In any case, this smells like premature optimization to me. Choose approach based on which one is easiest to read and debug.

I believe that if storing as two fields, you're not going to have just one if, your code is going to crawl with them. The if is in all probability more efficient and it is also more efficient memory-wise to have two fields instead of yet another heap-allocated object—a full object at that, with its usual overhead. If your tree is going to be really, really huge (millions of nodes), this is a concern. Also, if each node's payload is tiny compared to the overhead.

What does deterministic mean?

I am reading the Java Hashmap documentation but I don't understand this sentence.
Note that the iteration order for
HashMap is non-deterministic. If you
want deterministic iteration, use
LinkedHashMap.
What does deterministic mean?

The simplest definition:
Given the same inputs, you always get the same outputs.
Above, it's saying that iterating through the exact same HashMap may give different results at different times, even when you haven't changed anything. Usually that doesn't matter, but if it does, you should use a LinkedHashMap.

In an order which can be "determined" in advance.
Because of the way hashing works, the elements in the map are "scrambled" into arbitrary locations. The scrambling positions cannot easily be determined in advance -- they aren't determinable -- you don't know the resulting order.

In simpler terms: When you call keys(), values() or entrySet() you get back a collection, over which you can iterate. That line is saying you can't expect the order in which the iterator returns objects will be any particular order. Especially, it can be different from both the insertion order and the natural ordering by key values.
If you want the iterator to work in insertion order, use a LinkedHashMap. If you want to iterate by key value, use a TreeMap. Be aware that both of these have slightly worse performance than a plain HashMap, as they both have to do extra work to keep track of the order.

Strictly speaking, HashMap iteration order is almost certainly not non-deterministic. Like the vast majority of computational processes, if you go through it exactly the same way, the results will be exactly the same. A truly non-deterministic system would incorporate some external random element, which is highly unlikely to be the case here. At least in most cases.
What they really mean, I think, is that just because the map contains a particular set of elements, you shouldn't expect that when you iterate over them they will come out in a particular order. That doesn't mean the order of iteration is random, it just means that as a developer you shouldn't bank on knowing what it is.
In most cases, the reason for this is that there will be some dependency on some implementation details that can vary from platform to platform, and/or on order of access. And the latter may in turn be determined by thread scheduling and event timing, which are innately unpredictable.
In most cases, on any individual platform and with the most common threading model -- a single threaded application -- if you always insert and delete a particular set of things in sequence X you will always get them out in sequence Y. It's just that Y will be so exactly dependent on X, and on the platform, that there's no point even thinking about what it's going to be.
Basically, even though it isn't random, it might just as well be.

deterministic : can be determined
non-deterministic : can't be determined

It's an algorithm that when given a particular input will produce the same output.
A good example I found:
Consider a shopping list: a list of
items to buy.
It can be interpreted in two ways:
* The instruction to buy all of those items, in any order.
This is a nondeterministic algorithm.
* The instruction to buy all of those items, in the order given. This is a
deterministic algorithm.

Deterministic means the result is predictable / forseeable.

Non-deterministic means that there isn't one single result that you can figure it out beforehand. An arithmetical expression, like 1 + 2 or log e, is deterministic. There's exactly one correct answer and you can figure it out upfront. Throw a handful of sand in the air, and where each grain will fall is effectively non-deterministic for any major degree of accuracy.
This probably isn't precisely correct, as you could look at the source code of the underlying library and JVM implementation and there would probably be some way that you could determine the ordering that would result. It might be more correct for them to say,"No particular order is guaranteed," or something of that sort.
What's relevant in this case is that you can't rely on the ordering.

This is the property of HashMap where elements are not iterated in the same order in which they were inserted as HashMap does not insert elements in order. Therefore the line in documentation

Non deterministic means there is no well defined behaviour.
In the case of the HashMap depending on how you inserted elements you might have the one or other order of iteration.

HashMap doesn't maintain order what you add, if you want your output be the order what you add, you should use LinkedHashMap, so deterministic means output orderdly what you add in.
Here is example:
1.non-deterministic
HashMap<String, Integer> map = new HashMap<String,Integer>();
map.put("a",5);
map.put("b",16);
map.put("c",46);
System.out.println(map); //ouptput:{a=5, c=46, b=16}
2.deterministic
HashMap<String, Integer> map = new LinkedHashMap<String,Integer>();
map.put("a",5);
map.put("b",16);
map.put("c",46);
System.out.println(map); //output:{a=5, b=16, c=46}

Is it OK to have a Java Comparator where order can change dynamically?

I have a set of time stamped values I'd like to place in a sorted set.
public class TimedValue {
public Date time;
public double value;
public TimedValue(Date time, double value) {
this.time = time;
this.value = value;
}
}
The business logic for sorting this set says that values must be ordered in descending value order, unless it's more than 7 days older than the newest value.
So as a test, I came up with the following code...
DateFormat dateFormatter = new SimpleDateFormat("MM/dd/yyyy");
TreeSet<TimedValue> mySet = new TreeSet<TimedValue>(new DateAwareComparator());
mySet.add(new TimedValue(dateFormatter.parse("01/01/2009"), 4.0 )); // too old
mySet.add(new TimedValue(dateFormatter.parse("01/03/2009"), 3.0)); // Most relevant
mySet.add(new TimedValue(dateFormatter.parse("01/09/2009"), 2.0));
As you can see, initially the first value is more relevant than the second, but once the final value is added to the set, the first value has expired and should be the least relevant.
My initial tests say that this should work... that the TreeSet will dynamically reorder the entire list as more values are added.
But even though I see it, I'm not sure I believe it.
Will a sorted collection reorder the entire set as each element is added? Are there any gotchas to using a sorted collection in this manner (i.e performance)? Would it be better to manually sort the list after all values have been added (I'm guessing it would be)?
Follow-up:
As many (and even I to a certain extent) suspected, the sorted collection does not support this manner of "dynamic reordering". I believe my initial test was "working" quite by accident. As I added more elements to the set, the "order" broke down quite rapidly. Thanks for all the great responses, I refactored my code to use approaches suggested by many of you.

I don't see how your comparator can even detect the change, unless it remembers the newest value it's currently seen - and that sounds like an approach which is bound to end in tears.
I suggest you do something along the following lines:
Collect your data in an unordered set (or list)
Find the newest value
Create a comparator based on that value, such that all comparisons using that comparator will be fixed (i.e. it will never return a different result based on the same input values; the comparator itself is immutable although it depends on the value originally provided in the constructor)
Create a sorted collection using that comparator (in whatever way seems best depending on what you then want to do with it)

I would advise against this for a few reasons:
Since it's basically a red-black tree behind the scenes (which doesn't necessarily have to be rebuilt from scratch on every insertion), you might easily end up with values in the wrong part of the tree (invalidating most of the TreeSet API).
The behavior is not defined in the spec, and thus may change later even if it's working now.
In the future, when anything goes strangely wrong in anything remotely touching this code, you'll spend time suspecting that this is the cause.
I would recommend either recreating/resorting the TreeSet before searching it or (my preference) iterating through the set before the search and removing any of the objects that are too old. You could even, if you wanted to trade some memory for speed, keep a second list ordered by date and backed by the same objects so that you all you would have to do to filter your TreeSet is remove objects from the TreeSet based on the time-sorted list.

I don't believe the JDK libraries or even 3rd party libraries are written to handle a comparator whose results are not consistent. I wouldn't depend on this working. I would worry more if your Comparator can return not-equal for two values when called one time and can return equal for the same two values if called later.
Read carefully the contract of Comparator.compare(). Does your Comparator satisfy those constraints?
To elaborate, if your Comparator returns that two values are not equals when you call it once, but then later returns that the two values are equal because a later value was added to the set and has changed the output of the Comparator, the definition of "Set" (no duplicates) becomes undone.
Jon Skeet's advice in his answer is excellent advice and will avoid the need to worry about these sorts of problems. Truly, if your Comparator does not return values consistent with equals() then you can have big problems. Whether or not a sorted set will re-sort each time you add something, I wouldn't depend on, but the worst thing that would occur from order changing is your set would not remain sorted.

No, this won't work.
If you are using comparable keys in a collection, the results of the comparison between two keys must remain the same over time.
When storing keys in a binary tree, each fork in the path is chosen as the result of the comparison operation. If a later comparison returns a different result, a different fork will be taken, and the previously stored key will not be found.

I am 99% certain this will not work. If a value in the Set suddenly changes its comparison behaviour, it is possible (quite likely, actually) that it will not be found anymore; i.e. set.contains(value) will return false, because the search algorithm will at one point do a comparison and continue in the wrong subtree because that comparison now returns a different result than it did when the value was inserted.

I think the non-changing nature of a Comparator is supposed to be on a per-sort basis, so as long as you are consistent for the duration of a given sorting operation, you are ok (so long as none of the items cross the 7 day boundary mid-sort).
However, you might want to make it more obvious that you are asking specifically about a TreeSet, which I imagine re-uses information from previous sorts to save time when you add a new item so this is a bit of a special case. The TreeSet javadocs specifically defer to the Comparator semantics, so you are probably not officially supported, but you'd have to read the code to get a good idea of whether or not you are safe.
I think you'd be better off doing a complete sort when you need the data sorted, using a single time as "now" so that you don't risk jumping that boundary if your sort takes long enough to make it likely.

It's possible that a record will change from <7 days to >7 days mid-sort, so what you're doing violates the rules for a comparator. Of course this doesn't mean it won't work: lots of things that are documented as "unpredictable" in fact work if you know exactly what is happening internally.
I think the textbook answer is: This is not reliable with the built-in sorts. You would have to write your own sort function.
At the very least, I would say that you can't rely on a TreeSet or any "sorted structure" magically resorting itself when dates roll over the boundary. At best this might work if you re-sort just before displaying, and don't rely on anything remaining correct between updates.
At worst, inconsistent comparisons might break the sorts badly. You have no assurance that this won't put you into an infinite loop or some other deadly black hole.
So I'd say: Read the source code from Sun for whatever classes or functions you plan to use, and see if you can figure out what will happen. Testing is good, but there are potentially tricky cases that are difficult to test. THe most obvious is: What if while it's in the process of sorting, a record rolls over the date boundary? That is, it might look at a record once and say it's <7 but the next time it sees it it's >7. That could be bad, bad news.
One obvious trick that occurs to me: Convert the date to an age at the time you add the record to the structure, rather than dynamically. That way it can't change within the sort. If the structure is going to live for more than a few minutes, recalculate the ages at some appropriate time and then re-sort. I doubt someone will say your program is incorrect because you said a record was less than 7 days old when really it's 7 days, 0 hours, 0 minutes, and 2 seconds old. Even if someone noticed, how accurate is their watch?

As already noted, the Comparator cannot do this for you, because the transitivity is violated. Basically, in order to be able to sort the items, you must be able to compare each two of them (independent of the rest), which obviously you cannot do. So your scenario basically either would not work or would produce not consistent result.
Maybe something simpler would be good enough for you:
apply simple Comparator that uses the Value as you need
and simply remove from your list/collection all elements that are 7 days older then the newest. Basically whenever a new item is added, you check if it is the newest, and if it is, remove those that are 7 days older then this.
This would not work if you also remove the items from the list, in which case you would need to keep all those you removed in a separate list (which by the way you would sort by date) and add those back to the original list in case the MAX(date) is smaller after removal.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.