Best way to traverse an almost complete undirected weighted graph - java

need help with an optimized solution for the following problem http://acm.ro/prob/probleme/B.pdf.
Depending on the cost i either traverse the graph using only new edges, or using only
old edges, both of them work, but i need to pass test in a limited number of milliseconds,
and the algorithm for the old edges is dragging me down.
I need a way to optimize this, any suggestions are welcome
EDIT: for safety reasons i am taking the algorithm down, i'm sorry, i'm new so I don't
know what i need to do to delete the post now that it has answers

My initial algorithmic suggestion relied on an incorrect reading of the problem. Further, a textbook breadth-first search or Dijkstra on a graph of this size is unlikely to finish in a reasonable amount of time. There's likely an early-termination trick that you can employ for large cases; the thread Niklas B. linked to suggests several (as well as some other approaches). I couldn't find an early-termination trick that I could prove worked.
These micro-optimisation notes are still relevant, however:
I'd suggest not using Java's built-in Queue container for this (or any other built-in Java containers for anything else in a programming contest). It turns your 4-byte int into a gargantual Integer structure. This is very probably where your blowup is coming from. You can use a 500000-long int[] to store the data in your queue and two ints for the front and back of the queue instead. In general, you want to avoid instantiating Objects in Java contest programming because of their overhead.
Likewise, I'd suggest representing the edges of the graph as either a single big int[] or a 500000-long int[][] to cut down on that piece of overhead.

I only saw one queue in you code. That means you are searching from one direction only.
You may want to take a look at
Bidirectional Search

Related

Is Object Oriented Approach with Graph Theory Algorithms faster in Java than general array based manipulations?

I always have a habit of creating lots of classes while solving Graph Theory problems like :
class Node{
......
}
class Edge{
......
}
Often this runs me into performance and speed issues. Hence I feel that using arrays for storing graphs is faster than storing it in User Defined Classes and Structures like Lists and Maps though the latter provides more flexibility and readability to the code. Hence do the use of arrays and language structures for representing graphs really cause any significant performance boost. If yes, which should be the general choice while coding in Java?
Measure it.
Build a solution, put it into a profiler and look where most of the computation time is used up. You cannot sensibly argue about this topic in general, you need experiments.
That said: In 98% of the cases, you are better of with writing readable OO code. If it turns out to be too slow, narrow down the method that causes the trouble (with a profiler) and try to make that method faster. Don't start writing ugly code in the hope that it might faster than nice one.
The problem with arrays is that they suppose a huge waste of memory for big graphs, which usually implement few links between nodes.
The performance boost you would get would depend not only on the data structure, but also on the type of graph and operations you perform on it.
E.g. deleting a node could be very expensive on an array implementation.
Well yes , sometimes graphs are faster then array. Basically Sometimes it's depend on your requirement. Sometimes if you use array is best choice and some times graphs. There are many collections are available in java. like linkedlist , arraylist , vector , doublelinkedlist etc. All the collections are fast in java. You just have to choose the best possibilities matched with your requirement...

Subgraph matching (JUNG)

I have a set of subgraphs and I need to match them on the graph they were extracted from. I also need to count how many times each subgraph shows up in such graph (I need to store all possible matches). There must be a perfect match considering the edges' labels of both subgraph and graph, the vertices' labels, however, donĀ“t need to match each other. I built my system using JUNG API, so I would like a solution (api, algorithm etc) that could deal with the Graph structure provided by JUNG. Any thoughts?
JUNG is very full-featured, so if there isn't a graph analysis algorithm in JUNG for what you need, there's usually a strong, theoretical reason for it. To me, your problem sounds like an instance of the Subgraph Isomorphism problem, which is NP-Complete:
http://en.wikipedia.org/wiki/Subgraph_isomorphism_problem
NP-Completeness may or may not be familiar to you (it took me 7 years of college and Master's Degree in Computer Science to understand it!), so I'll give a high-level description here. Certain problems, like sorting, can be solved in Polynomial time with respect to their input size. For example, if I have a list of N elements, I can sort it in O(N log(N)) time. More specifically, if I can solve a problem in Polynomial time, this means I can solve the problem without exhausting every possible solution. In the sorting case, I could traverse every possible permutation of the list and, if I found a permutation of the list that was sorted, return it. This is obviously not the fastest way to solve the problem though! Some very clever mathematicians were able to get it down to its theoretical minimum of O(N log(N)), thus we can sort really big lists of things quite quickly using computers today.
On the flip-side, NP-Complete problems are thought to have no Polynomial time solution (I say thought because no one has ever proven it, although evidence strongly suggests this is the case). Anyway, what this means is that you cannot definitively solve an NP-Complete problem without first exhausting every possible solution. The time complexity of NP-Complete problems are always O(c ^ N) or worse, where c is some constant greater than 1. This means that the time required to solve the problem grows exponentially with every incremental increase in problem size.
So what does this have to do with my problem???
What I'm getting at here is that, if the Subgraph Isomorphism problem is NP-Complete, then the only way you can determine if one graph is a subgraph of another graph is by exhausting every possible solution. So you can solve this, but probably only up to graphs of a few nodes or so (since the problem's time complexity grows exponentially with every incremental increase in graph size). This means that it is computationally infeasible to compute a solution for your problem because as soon as you reach a certain graph size, it will quite literally take forever to find a solution.
More practically, if your boss asks you to do something that is provably NP-Complete, you can simply say it's impossible and he will have to listen to you. If your professor asks you to do something that is provably NP-Complete, show him that it's NP-Complete and you'll probably get an A for the course. If YOU are trying to do something NP-Complete of your own accord, it's better to just move on to the next project... ;)
Well, I had to solve the problem by implementing it from scratch. I followed the strategy suggested in the topic Any working example of VF2 algorithm?. So, if someone is in doubt about this problem too, I suggest to take a look at Rich Apodaca's answer in the aforementioned topic.

Which Java data object to use for multidimensional range matching?

Project Background:
I am writing a map tile overlay class for java that can use gdal2tile.py tiles. Basically I will end up with thousands of jpg files that are in a file structure like
"Zoom Level/X coordinate/Y coordinate"
The coordinates are ints but will not necessarily start at 0 or 1.
I will have to search for tiles that are within a certain range to find out which ones I need to render.
My Problem:
I tried iterating using the file structure itself but it is wicked slow (not surprising).
I tried iterating using an ArrayList of strings of the file structure and .contains() but it seems to be even slower (not too surprising).
Optimally I would like to use a data structure that would let me choose a range on multiple dimensions so that I can call something like.
Tiles.getWhere(Zoom Level,min X,max X,min Y,maxY);
I assume that some sort of Collection or TreeMap would be the right choice but I'm not experienced enough with Java to know for sure and I'd prefer not to have to benchmark a lot of different approaches.
I could use SQLite to do it but that seems like overkill.
My Question:
What is the most efficient way to check for the existence of datasets given multiple dimensional constraints?
May be you are looking for a map with multiple keys.
Commons-collections provides a map with multiple lookup keys:
http://commons.apache.org/collections/apidocs/org/apache/commons/collections/map/MultiKeyMap.html
a map guarantees a O(1) insertion and O(1) selection timings.
Thinking of your problem I could find out three directions to which you could aim your search next (this is not a hand-by-hand guide but rather a out-of-the-box brain opener for a stucked situation you have faced):
1) Usage of Java built in structures. Yes, indeed, a list is the worst case of a searching method. A Map, as the name suggests, is far more convenient for maps. It is not only the name, but the indexing to a Map is signifigantly less time consuming compared to a List. You can imagine your map as a cube, where you have to handle about half of the dots inside it, if you use List and probably only a narrow layer of it when you search by indexing a Map. There is a magnitude of difference. So, my answer here: Map is a key word towards the correct direction (assuming you want to do it in this way after reading on my answer).
2) Usage of a Map Server solution. This is probably too far from your approach, but entire frameworks are made for solving your type of question. An example is GeoServer. It has a ready made solution for the entire problem. It is a stable solution for the great big problem possibly in your hand: showing a map to a user from a source.
3) Sticking to the GDAL framework you were using, you could select slightly different py-file, like gdal_proximity.py and - wow! - you have a searching possibility in your hand! This particular one searches by a center point and a distance, but will do the stuff you need =)
There is a starting point, how I would make it. Could this serve for something?
Sounds to me like you are looking for something like an Interval Tree.
http://en.wikipedia.org/wiki/Interval_tree
I have implemented one of these in the past but only in one dimension. The Wikipedia reference mentions extensions to more dimensions.
Paul

What is the most efficent way to create a tree in java?

I am creating a tree in java to model the Extensive-form of a game for an AI. The tree will be an 25-ary tree (a tree in which each branch at most has 25 child branches) because at each turn of the game there are 25 different moves. Because the number of new branches that have to be created in each new layer of the tree is 25^n I'm very concerned with making this efficient. (I intend to remorselessly cut of branches to keep them from growing in order to keep things from getting bogged down). What is the best way to model such a tree when efficiency is such a concern? My first impression is to have a node object where each node has a parent node and an array of child nodes but this means creating a lot of objects. Ultimately these are my questions:
Is this the fastest way create and manage my tree?
What is a good way to figure out how much time any given algorithm or process in a program is going to take? (the only one I've thought of so far is to create a date before the process and then after and compare the # of milliseconds that have passed)
Any other thoughts are also welcome. My question implies and is related to a great number of other questions, i would expect. If i have been ambiguous or unclear please comment to let me know instead of down-voting and storming off as this isn't productive.
Realistically, the way you described is the best approach. It'll perform reasonably well compared to anything else you could do and will be straightforward to implement.
Time and again people are asking questions about how to do something "efficiently". The best answer is nearly always, "don't even bother trying". Unless your improvement is an algorithmic one, it's unlikely to make much difference anyway, and especially in a case like this, the extra effort and complexity isn't worth whatever miniscule gain you might be able to achieve.
Putting it another way, and to borrow a quote (though I can't remember the originator), the first rule of optimization is: don't.
Having said that, if you really feel the need to squeeze every last drop of speed, you could try caching and re-using objects (instead of discarding them completely, keep track of them in a free object store, and then when you need to create a new object, first check the free object store to check if there is an existing one). As always, you'll need to measure performance before and after to see if it really helps (chances are it won't help much, unless physical memory is really constrained, in which case garbage collection can become expensive).
I agree with the previous comment about only optimizing once you have implemented the rest of the application.
On the other hand, I do realize a few things that may be of importance:
Branching factor of 25: Although not ridiculously huge, it is still large with respect to other problems. For a tree, you will definitely have to have a list for each node to indicate the list of SubNodes. You can do this either by making a Node class which has a collection of nodes within it, or have an external Map that maps a given node to a list of children nodes.
Removing and adding of elements will be done: This lends itself to a LinkedList implementation of the stored children since you don't want to perform costly removes and adds. A HashSet may work also, but the problem is that you may need more memory.
Iteration of the elements may or may not be done: If you want to iterate over the entire list at each step, LinkedLists are fine. If you want to prioritize the nodes then you may be saving memory by using a priority queue data structure. Priority queues are especially helpful if you are going to implement a heuristic function and evaluate which child to move to at any given node.
Thats all I have so far, but I'll keep updating if I think of more things, or if you update your content.

What's the most efficient way of implementing a fixed-size non-blocking queue in Java?

I am trying to find (or write) a Java class that represents a fixed-size, non-blocking, auto-discarding FIFO queue. (e.g. if the queue has a capacity of 100, putting item 101 removes item 1 then successfully appends item 101.) The answer to this question seems helpful, but I have an extra constraint - I need it to be fast, for capacities of around 100-1000.
The items in my queue are only Floats, so is it generally more efficient to use something like the AutoDiscardingDeque<Float> described in the linked question, or just to use a float[] and some System.arraycopy() manipulation to handle it?
Alternatively, is there an even better solution that I haven't thought of?
If you only ever need to use floats, then yes, a float[] will be optimal in the implementation. You shouldn't need to copy the array at all - just maintain a "start position" and an "end position". You already know the capacity, so you can create the array to start with and never budge from it.
Note that I'm not suggesting you use a float[] instead of a queue here - just that you can implement the queue using a float[]. Of course, that means you can't easily make it implement Deque<Float> which you may want it to, without incurring boxing/unboxing costs... but if you're happy only ever using the concrete class within your client code, you'll end up with efficiency savings.
If you think you are going to want to perform a number of math related functions on your structure, specifically statistics functions like mean, max, min, ect., then you could use DescriptiveStatistics from Apache Commons Math (http://commons.apache.org/math/userguide/stat.html#a1.2_Descriptive_statistics). You can set your window size and it will automatically maintain your elements. However it takes doubles, not floats, so it might not be the perfect solution for you.
I need it to be fast, for capacities
of around 100-1000
Please, specify, which operations you need to be fast? Implementation is very sensible to how you are going to use it.
If you need accessing it by index very often, than the solution above seems good enough

Categories