I am creating a tree in java to model the Extensive-form of a game for an AI. The tree will be an 25-ary tree (a tree in which each branch at most has 25 child branches) because at each turn of the game there are 25 different moves. Because the number of new branches that have to be created in each new layer of the tree is 25^n I'm very concerned with making this efficient. (I intend to remorselessly cut of branches to keep them from growing in order to keep things from getting bogged down). What is the best way to model such a tree when efficiency is such a concern? My first impression is to have a node object where each node has a parent node and an array of child nodes but this means creating a lot of objects. Ultimately these are my questions:
Is this the fastest way create and manage my tree?
What is a good way to figure out how much time any given algorithm or process in a program is going to take? (the only one I've thought of so far is to create a date before the process and then after and compare the # of milliseconds that have passed)
Any other thoughts are also welcome. My question implies and is related to a great number of other questions, i would expect. If i have been ambiguous or unclear please comment to let me know instead of down-voting and storming off as this isn't productive.
Realistically, the way you described is the best approach. It'll perform reasonably well compared to anything else you could do and will be straightforward to implement.
Time and again people are asking questions about how to do something "efficiently". The best answer is nearly always, "don't even bother trying". Unless your improvement is an algorithmic one, it's unlikely to make much difference anyway, and especially in a case like this, the extra effort and complexity isn't worth whatever miniscule gain you might be able to achieve.
Putting it another way, and to borrow a quote (though I can't remember the originator), the first rule of optimization is: don't.
Having said that, if you really feel the need to squeeze every last drop of speed, you could try caching and re-using objects (instead of discarding them completely, keep track of them in a free object store, and then when you need to create a new object, first check the free object store to check if there is an existing one). As always, you'll need to measure performance before and after to see if it really helps (chances are it won't help much, unless physical memory is really constrained, in which case garbage collection can become expensive).
I agree with the previous comment about only optimizing once you have implemented the rest of the application.
On the other hand, I do realize a few things that may be of importance:
Branching factor of 25: Although not ridiculously huge, it is still large with respect to other problems. For a tree, you will definitely have to have a list for each node to indicate the list of SubNodes. You can do this either by making a Node class which has a collection of nodes within it, or have an external Map that maps a given node to a list of children nodes.
Removing and adding of elements will be done: This lends itself to a LinkedList implementation of the stored children since you don't want to perform costly removes and adds. A HashSet may work also, but the problem is that you may need more memory.
Iteration of the elements may or may not be done: If you want to iterate over the entire list at each step, LinkedLists are fine. If you want to prioritize the nodes then you may be saving memory by using a priority queue data structure. Priority queues are especially helpful if you are going to implement a heuristic function and evaluate which child to move to at any given node.
Thats all I have so far, but I'll keep updating if I think of more things, or if you update your content.
Related
I know priority queues tend to use heaps but what is the point of a priority queue when they basically seem the same as heaps? I initially thought all priority queues use hash maps to keep track of all object's locations in the heap, making finding and updating/deleting said object easier. However, I have used Java's priority queue and you have to manually iterate over it to update or delete objects not at the root. It seems odd to have a priority queue that appears to literally just be a heap with nothing else special about it.
It might help to reason by analogy here:
List is to dynamic array as PriorityQueue is to binary heap.
That is, the abstract idea of a list (a sequence of things starting at position zero where items can be inserted and removed) is a nice, high-level concept, while a dynamic array (an array along with a capacity that doubles or 1.5x’s in size if extra space is needed) is one possible way of implementing a list. If you’re using a list, you can just think “oh, it’s a sequence, and I can put things places” without worrying how that sequence is actually represented. On the other hand, working with a dynamic array requires you to track which array elements are valid versus which ones don’t actually get used, you need to manually transfer things over when there’s no more space and think carefully about your growth strategy, etc. The distinction here is at what level you’re viewing things. If you just need “a sequence,” think “list.” If you need to build a type from scratch representing a sequence, think “dynamic array.”
This is basically what’s going on with priority queues versus binary heaps. A priority queue abstractly represents the idea of “I can put things in and they’ll come back in sorted order.” A binary heap is one specific possible way of implementing a priority queue. When working with an abstract priority queue, you can focus your thoughts purely on questions like “what elements do I want to add?” and “how do I rank them?” When working with a binary heap, you have to think about things like “do I use one-indexing or zero indexing?” and “what’s the formula for identifying the children of a node at index k?” If all you need is the ability to put things in a bag and take them out in sorted order, you can use a priority queue without worrying about how it works. If you need to build one from scratch, you can use a binary heap.
Going back to the list versus dynamic array analogy: there are many types you can use to represent lists. Dynamic arrays are one, but you could also use a circular buffer (good if you add or remove from the ends) or a linked list (good if items get moved around between lists). Similarly, there are many ways you can implement a priority queue. Binary heaps are one option, but there’s also pairing heaps, binomial heaps, etc. Keeping the relevant abstraction in focus - I just want a sequence of things, I just want a way to retrieve things in sorted order - means that you don’t need to worry as much about how things work when what you care about is what operations you want to do.
Personal opinion, you are right, Java's PriorityQueue is heap. Java as a high level programming language, it is reasonable for it to provide all the common and standard algorithm implementations, most of the time we focus on business logic development and how to get the job done faster. So we don't want spend too much time on building a priority queue from the ground up, besides it is tedious and error-prone to do it yourself.
If you want update or delete objects at the same time, and don't want to iterate over it manually, you can just do this:
Object updatedObject;
priorityQueue.add(priorityQueue.remove(updatedObject));
although it's not efficient enough when updating occurs frequently, there is an alternative algorithm called Fibonacci heap to do the job better:
It seems odd to have a priority queue that appears to literally just be a heap with nothing else special about it.
Why?
Nothing about the name PriorityQueue promises anything more than the ability to put items in one end and get them out the other in sorted-by-priority order. That's also basically the definition of a heap, which is why a heap makes an ideal data structure to implement a priority queue.
So, essentially, the Java Collections Framework designers implemented a heap. Only instead of calling it a Heap, they called it a PriorityQueue. End of story. As the song lyric goes: "Who could ask for anything more?"
Java's Priority queue is can be either a min Heap or a max Heap, and based on how you have constructed it, it will always give you the min/max value.
need help with an optimized solution for the following problem http://acm.ro/prob/probleme/B.pdf.
Depending on the cost i either traverse the graph using only new edges, or using only
old edges, both of them work, but i need to pass test in a limited number of milliseconds,
and the algorithm for the old edges is dragging me down.
I need a way to optimize this, any suggestions are welcome
EDIT: for safety reasons i am taking the algorithm down, i'm sorry, i'm new so I don't
know what i need to do to delete the post now that it has answers
My initial algorithmic suggestion relied on an incorrect reading of the problem. Further, a textbook breadth-first search or Dijkstra on a graph of this size is unlikely to finish in a reasonable amount of time. There's likely an early-termination trick that you can employ for large cases; the thread Niklas B. linked to suggests several (as well as some other approaches). I couldn't find an early-termination trick that I could prove worked.
These micro-optimisation notes are still relevant, however:
I'd suggest not using Java's built-in Queue container for this (or any other built-in Java containers for anything else in a programming contest). It turns your 4-byte int into a gargantual Integer structure. This is very probably where your blowup is coming from. You can use a 500000-long int[] to store the data in your queue and two ints for the front and back of the queue instead. In general, you want to avoid instantiating Objects in Java contest programming because of their overhead.
Likewise, I'd suggest representing the edges of the graph as either a single big int[] or a 500000-long int[][] to cut down on that piece of overhead.
I only saw one queue in you code. That means you are searching from one direction only.
You may want to take a look at
Bidirectional Search
I've building a tree pagination in JSF1.2 and Richfaces 3.3.2, because I have a lot of tree nodes (something like 80k), and it's slow..
So, as first attempt, I create a HashMap with the page and the list of nodes of the page.
But, the performance isn't good enough...
So I was wondering if is something faster than a HashMap, maybe a List of Lists or something.
Someone have some experience with this? What can I do?
Thanks in advance.
EDIT.
The big problem is that I have to validate permissions of users in the childnodes of the tree. I knew that this is the big problem: this validation is slow, because I have to go inside the nodes, I don't have a good way to know if the user have permission in a 10th level node without iterate all of them. Plus to this, the same three has used in more places...
The basic reason for why I was doing this pagination, is that the client side will be much slow, because of the structure generated by richfaces, a lot of tr's and td's, the browser just going crazy with this.
So, unfortunatelly, I have to load all the nodes, and paginate just client side, and I need to know what of them is faster to iterate...
Sorry my bad english.
A hash map is the fastest data structure if you want to get all nodes for a page. The list of nodes can be fetched in constant time (O(1)) while with lists the time is O(n) (n=number of pages, faster on sorted lists but never getting near O(1))
What operations on your datastructure are too slow. That's what you have to analyse before you start optimization.
It's probably more due to the fact that JSF is a performance pig than a data structure choice. The one attempt I've seen to create a JSF app could be timed with a sundial.
You're making a mistake by guessing about solutions without more knowledge about the root cause. I'd recommend that you profile your app to see where the time is being spent.
The data structure to use always depends on how you need to store the data and how you need to access it. HashMap<K, V> is supposed to have constant time complexity in accessing the value, provided the key. When you call get(key), the hashCode() for key is computed and it's used to retrieve the related value. Unless you've got different keys that have the same hashcode (in which case you may have been doing something wrong, as while is not mandatory different objects should have different hash codes, at least in the majority of cases), this is usually fast.
Searching an element in a plain list requires scanning of the list, which will (almost) always be slower than computing an hashcode.
If you need to associate values with keys, a Map is the way. And HashMap should be fast enough.
I don't know too much about JSF, but I think - if the data structure and access pattern is the one that a Map is designed for - the problem is not the HashMap itself.
I would solve this with a javascript/ajax calls method that fetches childnodes.
In java, im creating a SortedSet from a list which is always going to be ordered (but is only of type ArrayList). I figure adding them one by one is going to have pretty poor performance (in the case for example of an AVL tree), as it will have to reorder the tree a lot.
my question is, how should i be creating this set? in a way that it is as fast as possible to build a balanced tree?
the specific implementation i was planning on using was either IntRBTreeSet or IntAVLTreeSet from http://fastutil.dsi.unimi.it/docs/it/unimi/dsi/fastutil/ints/IntSortedSet.html
after writing this up, I think the poor performance wont affect me too much anyway (too small amount of data), but im still interested in how it would be done in a general case.
A set having a tree implementation would have the middle element from your list in the top. So the algorithm would be as following:
find the middle element of the List
insert it into set
repeat for both sub-lists to the left and to the right of the middle element
Red-Black trees are a good choice for the general case, and they have very fast inserts. See Chris Okasaki's paper for an elegant and fast implementation. The Functional Java library has a generic Set class that is backed by a red-black tree implemented according to this paper.
With all the discussion of using a Set, it occurs to me that maybe the problem could be re-stated. Why use a Set at all? If you just want to check for membership, and your source list is sorted, then do a binary search for the object - this will be as fast (and probably faster) than any n-tree you can envision, and it's not that tough to code.
So, envision a OrderedListSet interface that just wraps the underling List object. As long as the comparator used to order the list is also used for the binary search, this should be pretty straight-forward.
All Set operations will start with a getIndex(Object ob) call, then the appropriate action is taken on the List.
Do you have a performance problem with the simple approach of just inserting the elements as they come?
If not, don't optimize.
The built in TreeSet (http://java.sun.com/j2se/1.4.2/docs/api/java/util/TreeSet.html) class uses a red-black tree as it's backing tree (and, has been noted, red-black trees are quite fast for inserts). Here's good info on Red-Black trees (they don't have the problem of the typical binary tree implementation when inserting data that is mostly ordered already).
If you are dealing with huge data sets (big enough to require disk based backing, or significant paging file swap), then a B+Tree is a very good option (see JDBM for a Java based version of self-balancing B+Tree - it doesn't implement Set, but could be used that way if desired).
Depending on how your application is actually using this data, you might want to consider the GlazedLists library and make your lists 'live'. If all you are doing is static analysis, then this may be overkill, but it is an absolutely fantastic way of working with list based data. Definitely worth reading about.
I'm writing a Java Tree in which tree nodes could have children that take a long time to compute (in this case, it's a file system, where there may be network timeouts that prevent getting a list of files from an attached drive).
The problem I'm finding is this:
getChildCount() is called before the user specifically requests opening a particular branch of the tree. I believe this is done so the JTree knows whether to show a + icon next to the node.
An accurate count of children from getChildCount() would need to perform the potentially expensive operation
If I fake the value of getChildCount(), the tree only allocates space for that many child nodes before asking for an enumeration of the children. (If I return '1', I'll only see 1 child listed, despite that there are more)
The enumeration of the children can be expensive and time-consuming, I'm okay with that. But I'm not okay with getChildCount() needing to know the exact number of children.
Any way I can work around this?
Added: The other problem is that if one of the nodes represents a floppy drive (how archaic!), the drive will be polled before the user asks for its files; if there's no disk in the drive, this results in a system error.
Update: Unfortunately, implementing the TreeWillExpand listener isn't the solution. That can allow you to veto an expansion, but the number of nodes shown is still restricted by the value returned by TreeNode.getChildCount().
http://java.sun.com/docs/books/tutorial/uiswing/components/tree.html#data
scroll a little down, there is the exact tutorial on how to create lazy loading nodes for the jtree, complete with examples and documentation
I'm not sure if it's entirely applicable, but I recently worked around problems with a slow tree by pre-computing the answers to methods that would normally require going through the list of children. I only recompute them when children are added or removed or updated. In my case, some of the methods would have had to go recursively down the tree to figure out things like 'how many bytes are stored' for each node.
If you need a lot of access to a particular feature of your data structure that is expensive to compute, it may make sense to pre-compute it.
In the case of TreeNodes, this means that your TreeNodes would have to store their Child count. To explain it a bit more in detail: when you create a node n0 this node has a childcount (cc) of 0. When you add a node n1 as a child of this one, you n1.cc + cc++.
The tricky bit is the remove operation. You have to keep backlinks to parents and go up the hierarchy to subtract the cc of your current node.
In case you just want to have the a hasChildren feature for your nodes or override getChildCount, a boolean might be enough and would not force you to go up the whole hierarchy in case of removal. Or you could remove the backlinks and just say that you lose precision on remove operations. The TreeNode interface actually doesn't force you to provide a remove operation, but you probably want one anyway.
Well, that's the deal. In order to come up with precomputed precise values, you will have to keep backlinks of some sorts. If you don't you'd better call your method hasHadChildren or the more amusing isVirgin.
There are a few parts to the solution:
Like Lorenzo Boccaccia said, use the TreeWillExpandListener
Also, need to call nodesWereInserted on the tree, so the proper number of nodes will be displayed. See this code
I have determined that if you don't know the child count, TreeNode.getChildCount() needs to return at least 1 (it can't return 0)