Implementing Bentley–Ottmann Algorithm with an AVL tree - java

I'm having a problem implementing this method in java. I'm specifically implementing the algorithm FINDINTERSECTIONS in Computational Geometry 3rd Edition using an AVL BST tree for the status. The description from the book is shown below:
The problem I'm having is implementing step 5 in HANDLEEVENTPOINT. When the event point is an intersection, the status is no longer totally ordered there, because for intersection lines, they cross at their intersection point and need to be swapped in the status. Since the BST I'm using is an AVLTree, the delete method fails because the rebalancing method requires proper ordering of the elements (i.e. the delete method assumes the tree is properly ordered, and performs rotations with respect to the order in order to maintain log(n) height). Also, the status I'm using stores the data in the nodes instead of the leaves as shown in the figure. If I understand correctly, the book says that either kind of tree can be used.

First off use a leaf version of a balanced binary search tree whether red-black or AVL. I used red-black.
Get Peter Brass's book on advanced data structures because you will have trouble finding anything on these leaf trees in virtually all the standard algorithm / data structure books. I believe they are also called exogenous trees.
http://www-cs.engr.ccny.cuny.edu/~peter/
Also, you can look at "Algorithms and Data Structures: The Basic Toolbox" by Mehlhorn and Sanders which goes into the "sorted sequence" data structure. They create these with the help of only leaf trees when trees are used. These are also some of the folks that developed LEDA.
Also look at the LEDA book online bc it has a chapter on how to implement this algorithm and how to handle ALL the "problem cases." I think this is chapter 9 and is a bit hard to follow maybe because English is not the native tongue of the authors ... PITA!!
http://people.mpi-inf.mpg.de/~mehlhorn/LEDAbook.html
You can doubly link the leaf nodes data items together and you have created a sorted sequence with the tree as a navigation structure to the linked list of items. That is how LEDA and in think CGAL do this.
Duplicate items are handled differently in the event queue than the sweep line status structure. For the event queue, just add to a leaf a linked list of items (see Brass's book). Here each leaf corresponds to an event point and has a list of all segments with a starting-end-point this that same as the event point. So some will have empty lists like intersection-event-points and ending-event-points. At least that is how some implementations do this.
For the sweep status structure. Overlapping parallel segments are differentiated by say segment ids. They do not talk about these in the book you are reading/referencing. However, the LEDA book tells you how to handle these. So even though sweep status trees comparator says two segments have the same end-point and orientation, the comparator breaks the tie by using the segments indexes in the segments database, array or whatever.
Some more important points:
Pool points! This common pool of points are basic and then make up the segments and are used in all the data structures. Using the pool allows one to test for point equality by just testing for identity! This avoids using a comparator which slows things down and can introduce errors.
It is key that you avoid using the tree comparators as much as possible.
When checking if segments belong to the same bundle or are members of the three sets you are having a question about (i.e, start, end or interesect with and event point on the sweep-line), DO NOT USE THE COMPARATOR.
Instead, use that fact that segments belonging to the same bundle can have some "information property" say in the list that either points to the event queue when a segment intersects an event point, or points to the successor item in the list if the segment overlaps the successor, or points to null otherwise. So you will need some cross-linking between the event queue with the sweepline status structure. Your sets and bundles are the very fast and easy to find. Go to the start or end of the linked-list associate with the status tree and go through it item by item with a very simple test.
BOTTOM LINE. Get the sorted sequence / balanced binary tree data structure right and work on that a lot before implementing the rest of Bentley-Ottmann.
This is really the key and that book does not point that out at all but unfortunately that isn't it's intent since this implementation is tricky. Also, note that the book augments the navigation tree with an extra link in the internal nodes of the tree that point to associated leaf nodes. This just makes find a bit faster but may not be apparent if you are not familiar with leaf trees. A key in a leaf tree is often found twice, at the leaf node and elsewhere in an internal node of the tree.
FINALLY
Packages like LEDA/CGAL use exact arithmetic for things to work well. It took LEDA developers 10 years to get things right and that was mostly was due to using exact arithmetic. You maybe OK with a basic cross-product test used for orientation but if you need an exact version then you can find Prof. Jonathan Shewchuk exact arithmetic package on his site.
I guess your book just left all this out as an "exercise for the reader/student." LOL.

UPDATE: In your posted algorithm from that book, the swap for reversing intersecting segment order is done via delete and then with a re-insert. LEDA uses reverse_items() for these swaps. It's a more efficient way of doing sub-sequence reversals of nodes and items without the use of the comparator. Search for _rs_tree.c to see LEDA source or see below.
// reverse a subsequence of items, assuming that all keys are
// in the correct order afterwards
//
void rs_tree::reverse_items( rst_item pl, rst_item pr )
{
int prio ;
register rst_item ppl = p_item(pl), // pred of pl
ppr = s_item(pr), // succ of pr
ql, qr ;
while( (pl!=pr) && (pl!=ppl) ) { // pl and pr didnt't
// met up to now
// swap all of pl and pr except the key
// swap parents
ql = parent(pl) ; qr = parent(pr) ;
if( pl==r_child(ql) )
r_child(ql) = pr ;
else
l_child(ql) = pr ;
if( pr==r_child(qr) )
r_child(qr) = pl ;
else
l_child(qr) = pl ;
parent(pl ) = qr ; parent(pr) = ql ;
// swap left children
ql = l_child(pl) ; qr = l_child(pr) ;
if( ql != qr ) { // at least one exists
l_child(pl) = qr ; parent(qr) = pl ;
l_child(pr) = ql ; parent(ql) = pr ;
}
// swap right children
ql = r_child(pl) ; qr = r_child(pr) ;
if( ql != qr ) { // at least one exists
r_child(pl) = qr ; parent(qr) = pl ;
r_child(pr) = ql ; parent(ql) = pr ;
}
// swap priorities
prio = pl->prio ; pl->prio = pr->prio ;
pr->prio = prio ;
// swap pred-succ-ptrs
s_item(ppl) = pr ; p_item(ppr) = pl ;
ql = pl ; pl = s_item(pl) ; // shift pl and pr
qr = pr ; pr = p_item(pr) ;
s_item(ql) = ppr ; p_item(qr) = ppl ;
ppl = qr ; ppr = ql ; // shift ppl and ppr
}
// correct "inner" pred-succ-ptrs
p_item(ppr) = pl ; s_item(ppl) = pr ;
if( pl==pr ) { // odd-length subseq.
p_item(pl) = ppl ; s_item(pr) = ppr ;
}
}
ADDITIONALLY: Sorted sequence data structures can use AVL trees, ab-trees, red-black trees, splay trees, or skip lists. An ab-tree with a = 2, b = 16 fared best in speed comparison of search-trees used in LEDA**.
** S. Naber. Comparison of search-tree data structures in LEDA. Personal communication.

Related

Combine 2 Lists and set value once there is a match

I have 2 list for Organizations, one that followed by users and another is the all Organizations in the database, I made 2 loops inside each others to set the value of followed for original list with true if user following this organization (that can be known from user organizations list)
List<Organization> organizationList = getServiceInstance().getOrganizationService().findOrganizationList();
List<Organization> organizations = getServiceInstance().getOrganizationService().findFollowedOrganizationList(userId);
for (Organization fOrg: organizations) {
for (Organization organization : organizationList) {
if (Objects.equals(fOrg.id, organization.id)) {
fOrg.followed = true;
}
}
}
I believe there is better way to do this.
Try to use a map id->organization. That way you'd have 2 non-nested loops, one for building the map and one to loop over organizations and match the objects.
Example:
//build the map
Map<Integer, Organization> orgsInDB = new HashMap<>();
for( Organization org : organizationList ) {
orgsInDB.put(org.getId(), org );
}
//match
for( Organization org : organizations ) {
Organization orgInDB = orgsInDB.get( org.getId() );
if( orgInDB != null ) {
orgInDB.setFollowed( true );
}
}
The complexity for this drops from O(n*m) to O(n + m).
Edit: As Al-Mothafar correctly pointed out we should consider the worst case which would be following all organizations. Thus complexity of his approach would be O(n2) while the approach above would be O(n+n) (or, since constant factors are generally left out in big-oh notation, just O(n)).
I have used CollectionUtils from Apache commons library. It has static intersection, union and subtract methods which are suitable for your case as well. Pretty neat as well.
Nicely explained here with relation to set theory :)
I don't think, that there is a better way to do this, at least not in terms of performance: You've got to iterate over the organizations somewhere. Readability could be improved by packaging some lines into a private method.
The only suggestion I have is that you might replace Organization.followed (boolean) with Organization.followers (List). If so, you could simply check, whether the list is empty, or not. OTOH, this might slow down database accesses.

Clarification over In-Order traversal in binary search tree

I'm studying Trees in Java, and came across some confusing lines in the book I'm studying. The diagram given for the in-order traversal is this:
The code for the traversal (recursive) is:
private void inOrder(Node leftRoot) {
if (localRoot != null) {
inOrder(localRoot.leftChild);
System.out.println(localRoot.iData + " ");
inOrder(localRoot.rightChild);
}
}
The lines I'm confused at are:
Now we’re back to inOrder(A), just returning from traversing A’s left
child. We visit A and then call inOrder() again with C as an argument,
creating inOrder(C). Like inOrder(B), inOrder(C) has no children, so
step 1 returns with no action, step 2 visits C, and step 3 returns
with no action. inOrder(B) now returns to inOrder(A).
However,
inOrder(A) is now done, so it returns and the entire traversal is
complete. The order in which the nodes were visited is A, B, C; they
have been visited inorder. In a binary search tree this would be the
order of ascending keys.
I've highlighted the parts where I'm stuck at. First, I think in the third step, inOrder(C)[and not inOrder(B)] returns to inOrder(A).And second, the order in which the nodes were visited should be B -> A -> C.
Please help me out!
Yes, you are correct on both counts. These seem to be typos, or errata.
As a sidenote, I recognized the diagram style in your post because I learned data structures years ago from the same book (Lafore). Unfortunately it does not seem that he has a list of errata published anywhere, which is disappointing, since most authors do strive to do this.

All possible paths

I am currently working on an AI for playing the game Dots (link). The objective is to remove as many dots as possible by connecting similarly colored dots with a line. I've gone through the board and grouped each set of neighboring dots with the same color. The groups all currently share the same highlight color (black). So, for example, the four red dots in the top left form a single group, as do the three yellow dots on the top right.
I need to calculate every possible path through one of these groups. Can anyone think of a good algorithm? How could I avoid creating duplicate paths?
I've heard that a slightly modified DFS would be good in this situation. However, the paths are allowed to cross at nodes, but cannot reuse edges. How can I modify DFS accordingly?
Here's some pseudo code to get you started. It's how I would probably do it. Using edges instead of nodes solves the situation with crossing paths neatly, but retrieving edges is more difficult than nodes. You need to map the edge indexes to the node indexes.
You will get every path two times, since a path can be traversed from two directions.
If the dot groups grow large, consider pruning the least interesting paths. The memory requirement grows exponentially as 4^n where n is the number of dots in the group. I can't think of a good way to add incomplete paths without allowing for duplicates, but perhaps you're not interested in paths ending early?
private LinkedList<Edge> recurse(LinkedList<Edge> path) {
Edge last = path.getLast();
Edge right = <get Edge to the right of last>;
Edge bottom = <get Edge below last>;
Edge left = <get Edge to the left of last>;
Edge top = <get Edge above last>;
if( right && !path.contains(right) ) {
LinkedList<Edge> ps = path.clone(); // NOTE: check if the built-in clone() function does a shallow copy
ps.addLast( right );
paths.add( recurse(ps) );
}
if( bottom && !path.contains(bottom) ) {
...
}
if( left && !path.contains(left) ) {
...
}
if( top && !path.contains(top) ) {
...
}
return path;
}

Graph traversal - finding and returning the shortest distance

I am using a Breadth first search in a program that is trying to find and return the shortest path between two nodes on an unweighted digraph.
My program works like the wikipedia page psuedo code
The algorithm uses a queue data structure to store intermediate results as it traverses the graph, as follows:
Enqueue the root node
Dequeue a node and examine it
If the element sought is found in this node, quit the search and return a result.
Otherwise enqueue any successors (the direct child nodes) that have not yet been discovered.
If the queue is empty, every node on the graph has been examined – quit the search and return "not found".
If the queue is not empty, repeat from Step 2.
So I have been thinking of how to track number of steps made but I am having trouble with the limitations of java (I am not very knowledgeable of how java works). I originally was thinking that I could create some queue made up of a data type I made that stores steps and nodes, and as it traverses the graph it keeps track of the steps. If ever the goal is reached just simply return the steps.
I don't know how to make this work in java so I had to get rid of that idea and I moved on to using that wonky Queue = new LinkedList implementation of a queue. So basically I think it is a normal integer queue, I couldn't get my data type I made to work with it.
So now I have to find a more basic approach so I tried to use a simple counter, this doesn't work because the traversal algorithm searches down many paths before reaching the shortest one so I had an idea. I added a second queue that tracked steps, and I added a couple counters. Any time a node is added to the first queue I add to the counter, meaning I know that I am inspecting new nodes so I am not a distance further away. Once all those have been inspected I can then increase the step counter and any time a node is added to the first queue I add the step value to the step queue. The step queue is managed just like the node queue so that when the goal node is found the corresponding step should be the one to be dequeued out.
This doesn't work though and I was having a lot of problems with it, I am actually not sure why.
I deleted most of my code in panic and frustration but I will start to try and recreate it and post it here if anyone needs me to.
Were any of my ideas close and how can I make them work? I am sure there is a standard and simple way of doing this as well that I am not clever enough to see.
Code would help. What data structure are you using to store the partial or candidate solutions? You say your using a queue to store nodes to be examined, but really the objects stored in the queue should wrap some structure (e.g. List) that indicates the nodes traversed to get to the node to be examined. So, instead of simple Nodes being stored in the queue, some more complex object would be needed to make available the information necessary to know the complete path taken to that point. A simple node would only have information about itself, and it's children. But if you're examining node X, you also need to know how you arrived to node X. Just knowing node X isn't enough, and the only way (I know of) to know the path taken to node X is to store the path in the object that represents a "partial solution" or "candidate solution". If this is done, then finding the length of the path is trivial, because it's just the length of this list (or whichever data structure chosen). Hope I'm making some sense here. If not, post code and I'll take a look.
EDIT
These bits of code help show what I mean (they're by no means complete):
public class Solution {
List<Node> path;
}
Queue<Solution> q;
NOT
Queue<Node> q;
EDIT 2
If all you need is the length of the path, and not the path, per se, then try something like this:
public class Solution {
Node node; // whatever represents a node in you algorithm.
int len; // the length of the path to this node.
}
// Your queue:
LinkedList<Solution> q;
With this, before enqueuing a candidate solution (node), you do something like:
Solution sol = new Solution();
sol.node = childNodeToEnqueue;
sol.len = parentNode.len + 1;
q.add(sol);
The easiest solution in order to track distance during a traversal is to add a simple array (or a map if you vertices are not indexed by integers).
Here is pseudo code algorithm:
shortest_path(g, src, dst):
q = new empty queue
distances = int array of length order of g
for i = 0 to order: distances[i] = -1
distances[src] = 0
enqueue src in q
while q is not empty:
cur = pop next element in q
if cur is dst: return distances[dst]
foreach s in successors of cur in g:
if distances[s] == -1:
distances[s] = distances[cur] + 1
enqueue s in q
return not found
Note: order of a graph is the number of vertices
You don't need special data structures, the queue can just contains vertices' id (probably integers). In Java, LinkedList class implements the Queue interface, so it's a good candidate for your queue. For the distances array, if your vertices are identified by integers an integer array is enough, otherwise you need a kind of map.
You can also separate the vertex tainting (the -1 in my algo) using a separate boolean array or a set, but it's not really necessary and will waste some space.
If you want the path, you can also do that with a simple parent array: for each vertex you store its parent in the traversal, just add parent[s] = cur when you enqueue the successor. Then retrieving the path (in reverse order) is a simple like this:
path = new empty stack
cur = dst
while cur != src:
push cur in path
cur = parent[cur]
push src in path
And there you are …

trie building in java [duplicate]

This question already has answers here:
Trie implementation
(6 answers)
Closed 9 years ago.
How do I construct a tree from a file ? I want to be able to read them from a file and then add to appropriate level
It seems to me that you are trying to implement trie.
Look here for a nice implementation in java: http://www.cs.duke.edu/~ola/courses/cps108/fall96/joggle/trie/Trie.java
If you have only two levels in the tree before leafs (the actual words), you can simply start with arrays with 28 elements being and translate the letters to the index (i.e. a==1, b==2, etc.). Elements of array can be some set/list that contains the full words. You can lazily create arrays and lists (i.e. create the root array but have nulls for the other arrays and list of words, then you create an array/list when/if needed).
Am I reading rules you should follow correctly?
P.S. I think that using arrays with full size each would not be too wasteful on space as well it should be very fast to address
Update: #user1747976, well each array would take around 28*4 or 28*8 bits + 12 bytes overhead. Hopefully you use compressed ops so it is 28*4+12=116bytes per array. Now it depends if you want to be memory efficient or processing efficient. To be memory efficient, you can use some kind of hashmap instead of arrays but I'm not sure the additional overhead will be less than what you use with arrays. Processing will be for sure worse though.
You need to use some clever loop a number of times depending on tree dept requirement. Some ugly pseudo code for inserting into tree:
root=new Object[28];
word="something";
pos = root;
wordInd=1;
for (int i=1; i<=TREE_DEPTH ; i++) {
targetpos = letterInd(letter(wordInd,word));
if (i==TREE_DEPTH) {
if (pos[targetpos] == null) pos[targetpos] = new HashSet<String>();
(Set) pos[targetpos].add(word);
break;
} else {
if (pos[targetpos] == null) pos[targetpos] = new Object[28];
wordInd++;
pos = pos[targetpos];
}
}
Similar loop you can use for retrieving words.
Adding
Starting at the root, search for the first (or current) letter. If that letter is found then move to that node and search for the next letter. If the letter is not found, search for a word that matches the current letter, if there is a similar word then add the current letter as a new node and move both words under that, otherwise add the word.
Note: This will result in a tree that is more optimized for searches then the tree shown in the example. (adamant and adapt will be grouped under another 'a' node)
Update: Take a look at the Wikipedia article for Trie

Categories