Subgraph matching (JUNG)

Subgraph matching (JUNG) - java

I have a set of subgraphs and I need to match them on the graph they were extracted from. I also need to count how many times each subgraph shows up in such graph (I need to store all possible matches). There must be a perfect match considering the edges' labels of both subgraph and graph, the vertices' labels, however, don´t need to match each other. I built my system using JUNG API, so I would like a solution (api, algorithm etc) that could deal with the Graph structure provided by JUNG. Any thoughts?

JUNG is very full-featured, so if there isn't a graph analysis algorithm in JUNG for what you need, there's usually a strong, theoretical reason for it. To me, your problem sounds like an instance of the Subgraph Isomorphism problem, which is NP-Complete:
http://en.wikipedia.org/wiki/Subgraph_isomorphism_problem
NP-Completeness may or may not be familiar to you (it took me 7 years of college and Master's Degree in Computer Science to understand it!), so I'll give a high-level description here. Certain problems, like sorting, can be solved in Polynomial time with respect to their input size. For example, if I have a list of N elements, I can sort it in O(N log(N)) time. More specifically, if I can solve a problem in Polynomial time, this means I can solve the problem without exhausting every possible solution. In the sorting case, I could traverse every possible permutation of the list and, if I found a permutation of the list that was sorted, return it. This is obviously not the fastest way to solve the problem though! Some very clever mathematicians were able to get it down to its theoretical minimum of O(N log(N)), thus we can sort really big lists of things quite quickly using computers today.
On the flip-side, NP-Complete problems are thought to have no Polynomial time solution (I say thought because no one has ever proven it, although evidence strongly suggests this is the case). Anyway, what this means is that you cannot definitively solve an NP-Complete problem without first exhausting every possible solution. The time complexity of NP-Complete problems are always O(c ^ N) or worse, where c is some constant greater than 1. This means that the time required to solve the problem grows exponentially with every incremental increase in problem size.
So what does this have to do with my problem???
What I'm getting at here is that, if the Subgraph Isomorphism problem is NP-Complete, then the only way you can determine if one graph is a subgraph of another graph is by exhausting every possible solution. So you can solve this, but probably only up to graphs of a few nodes or so (since the problem's time complexity grows exponentially with every incremental increase in graph size). This means that it is computationally infeasible to compute a solution for your problem because as soon as you reach a certain graph size, it will quite literally take forever to find a solution.
More practically, if your boss asks you to do something that is provably NP-Complete, you can simply say it's impossible and he will have to listen to you. If your professor asks you to do something that is provably NP-Complete, show him that it's NP-Complete and you'll probably get an A for the course. If YOU are trying to do something NP-Complete of your own accord, it's better to just move on to the next project... ;)

Well, I had to solve the problem by implementing it from scratch. I followed the strategy suggested in the topic Any working example of VF2 algorithm?. So, if someone is in doubt about this problem too, I suggest to take a look at Rich Apodaca's answer in the aforementioned topic.

Related

Finding Admissible Heuristics for Martian Rover using A*

I'm trying to solve a problem about AI. I have a "robot" that should go from a point A to B much quickly and cheap as possible. This Rover can't climb heights higher than 10 units and the cost of his route is influenced by the kind of terrain. I need your help becouse I need to find a admissible heuristic for solving my problem. I already try the Euclidian distance but it's not enough. Can you help me?

I would recommend taking a look at this page for some ideas of heuristics. They will not all be applicable in your situation, since I don't believe you have a grid map? But you can at least have a look.
Furthermore, I'd recommend trying to take into account the various costs that are possible in different kinds of terrain. If, for example, you know that every single possible type of terrain has a minimum cost of 2, you can safely multiply your Euclidean distance by 2. If the most common type of terrain has a cost of 2, but there also are some types of terrain with a lower cost, you lose the guarantee of finding an optimal solution if you start multiplying by 2, but you may still find solutions more quickly in practice.
To me, the question sounds like a homework assignment (correct me if I'm wrong), which makes it difficult to give a single answer. I guess that the point of the homework assignment even is to research a bit and try some different things and see how they work (or don't work), which can improve your understanding of how the algorithm works. So, really, just try some things.

Best way to traverse an almost complete undirected weighted graph

need help with an optimized solution for the following problem http://acm.ro/prob/probleme/B.pdf.
Depending on the cost i either traverse the graph using only new edges, or using only
old edges, both of them work, but i need to pass test in a limited number of milliseconds,
and the algorithm for the old edges is dragging me down.
I need a way to optimize this, any suggestions are welcome
EDIT: for safety reasons i am taking the algorithm down, i'm sorry, i'm new so I don't
know what i need to do to delete the post now that it has answers

My initial algorithmic suggestion relied on an incorrect reading of the problem. Further, a textbook breadth-first search or Dijkstra on a graph of this size is unlikely to finish in a reasonable amount of time. There's likely an early-termination trick that you can employ for large cases; the thread Niklas B. linked to suggests several (as well as some other approaches). I couldn't find an early-termination trick that I could prove worked.
These micro-optimisation notes are still relevant, however:
I'd suggest not using Java's built-in Queue container for this (or any other built-in Java containers for anything else in a programming contest). It turns your 4-byte int into a gargantual Integer structure. This is very probably where your blowup is coming from. You can use a 500000-long int[] to store the data in your queue and two ints for the front and back of the queue instead. In general, you want to avoid instantiating Objects in Java contest programming because of their overhead.
Likewise, I'd suggest representing the edges of the graph as either a single big int[] or a 500000-long int[][] to cut down on that piece of overhead.

I only saw one queue in you code. That means you are searching from one direction only.
You may want to take a look at
Bidirectional Search

What is the most efficent way to create a tree in java?

I am creating a tree in java to model the Extensive-form of a game for an AI. The tree will be an 25-ary tree (a tree in which each branch at most has 25 child branches) because at each turn of the game there are 25 different moves. Because the number of new branches that have to be created in each new layer of the tree is 25^n I'm very concerned with making this efficient. (I intend to remorselessly cut of branches to keep them from growing in order to keep things from getting bogged down). What is the best way to model such a tree when efficiency is such a concern? My first impression is to have a node object where each node has a parent node and an array of child nodes but this means creating a lot of objects. Ultimately these are my questions:
Is this the fastest way create and manage my tree?
What is a good way to figure out how much time any given algorithm or process in a program is going to take? (the only one I've thought of so far is to create a date before the process and then after and compare the # of milliseconds that have passed)
Any other thoughts are also welcome. My question implies and is related to a great number of other questions, i would expect. If i have been ambiguous or unclear please comment to let me know instead of down-voting and storming off as this isn't productive.

Realistically, the way you described is the best approach. It'll perform reasonably well compared to anything else you could do and will be straightforward to implement.
Time and again people are asking questions about how to do something "efficiently". The best answer is nearly always, "don't even bother trying". Unless your improvement is an algorithmic one, it's unlikely to make much difference anyway, and especially in a case like this, the extra effort and complexity isn't worth whatever miniscule gain you might be able to achieve.
Putting it another way, and to borrow a quote (though I can't remember the originator), the first rule of optimization is: don't.
Having said that, if you really feel the need to squeeze every last drop of speed, you could try caching and re-using objects (instead of discarding them completely, keep track of them in a free object store, and then when you need to create a new object, first check the free object store to check if there is an existing one). As always, you'll need to measure performance before and after to see if it really helps (chances are it won't help much, unless physical memory is really constrained, in which case garbage collection can become expensive).

I agree with the previous comment about only optimizing once you have implemented the rest of the application.
On the other hand, I do realize a few things that may be of importance:
Branching factor of 25: Although not ridiculously huge, it is still large with respect to other problems. For a tree, you will definitely have to have a list for each node to indicate the list of SubNodes. You can do this either by making a Node class which has a collection of nodes within it, or have an external Map that maps a given node to a list of children nodes.
Removing and adding of elements will be done: This lends itself to a LinkedList implementation of the stored children since you don't want to perform costly removes and adds. A HashSet may work also, but the problem is that you may need more memory.
Iteration of the elements may or may not be done: If you want to iterate over the entire list at each step, LinkedLists are fine. If you want to prioritize the nodes then you may be saving memory by using a priority queue data structure. Priority queues are especially helpful if you are going to implement a heuristic function and evaluate which child to move to at any given node.
Thats all I have so far, but I'll keep updating if I think of more things, or if you update your content.

Easiest to code algorithm for Rubik's cube?

What would be a relatively easy algorithm to code in Java for solving a Rubik's cube. Efficiency is also important but a secondary consideration.

Perform random operations until you get the right solution. The easiest algorithm and the least efficient.

The simplest non-trivial algorithm I've found is this one:
http://www.chessandpoker.com/rubiks-cube-solution.html
It doesn't look too hard to code up. The link mentioned in Yannick M.'s answer looks good too, but the solution of 'the cross' step looks like it might be a little more complex to me.
There are a number of open source solver implementations which you might like to take a look at. Here's a Python implementation. This Java applet also includes a solver, and the source code is available. There's also a Javascript solver, also with downloadable source code.
Anthony Gatlin's answer makes an excellent point about the well-suitedness of Prolog for this task. Here's a detailed article about how to write your own Prolog solver. The heuristics it uses are particularly interesting.

Might want to check out: http://peter.stillhq.com/jasmine/rubikscubesolution.html
Has a graphical representation of an algorithm to solve a 3x3x3 Rubik's cube

I understand your question is related to Java, but on a practical note, languages like Prolog are much better suited problems like solving a Rubik's cube. I assume this is probably for a class though and you may have no leeway as to the choice of tool.

You can do it by doing BFS(Breadth-First-Search). I think the implementation is not that hard( It is one of the simplest algorithm under the category of the graph). By doing it with the data structure called queue, what you will really work on is to build a BFS tree and to find a so called shortest path from the given condition to the desire condition. The drawback of this algorithm is that it is not efficient enough( Without any modification, even to solver a 2x2x2 cubic the amount time needed is ~5 minutes). But you can always find some tricks to boost the speed.
To be honest, it is one of the homework of the course called "Introduction of Algorithm" from MIT. Here is the homework's link: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/assignments/MIT6_006F11_ps6.pdf. They have a few libraries to help you to visualize it and to help you avoid unnecessary effort.

For your reference, you can certainly look at this java implementation. -->
Uses two phase algorithm to solve rubik's cube. And have tried this code and it works as well.

One solution is to I guess simultaneously run all possible routes. That does sound stupid but here's the logic - over 99% of possible scrambles will be solvable in under 20 moves. This means that although you cycle through huge numbers of possibilities you are still going to do it eventually. Essentially this would work by having your first step as the scrambled cube. Then you would have new cubes stored in variables for each possible move on that first cube. For each of these new cubes you do the same thing. After each possible move check if it is complete and if so then that is the solution. Here to make sure you have the solution you would need an extra bit of data on each Rubiks cube saying the moves done to get to that stage.

Solving nonlinear equations numerically

I need to solve nonlinear minimization (least residual squares of N unknowns) problems in my Java program. The usual way to solve these is the Levenberg-Marquardt algorithm. I have a couple of questions
Does anybody have experience on the different LM implementations available? There exist slightly different flavors of LM, and I've heard that the exact implementation of the algorithm has a major effect on the its numerical stability. My functions are pretty well-behaved so this will probably not be a problem, but of course I'd like to choose one of the better alternatives. Here are some alternatives I've found:
FPL Statistics Group's Nonlinear Optimization Java Package. This includes a Java translation of the classic Fortran MINPACK routines.
JLAPACK, another Fortran translation.
Optimization Algorithm Toolkit.
Javanumerics.
Some Python implementation. Pure Python would be fine, since it can be compiled to Java with jythonc.
Are there any commonly used heuristics to do the initial guess that LM requires?
In my application I need to set some constraints on the solution, but luckily they are simple: I just require that the solutions (in order to be physical solutions) are nonnegative. Slightly negative solutions are result of measurement inaccuracies in the data, and should obviously be zero. I was thinking to use "regular" LM but iterate so that if some of the unknowns becomes negative, I set it to zero and resolve the rest from that. Real mathematicians will probably laugh at me, but do you think that this could work?
Thanks for any opinions!
Update: This is not rocket science, the number of parameters to solve (N) is at most 5 and the data sets are barely big enough to make solving possible, so I believe Java is quite efficient enough to solve this. And I believe that this problem has been solved numerous times by clever applied mathematicians, so I'm just looking for some ready solution rather than cooking my own. E.g. Scipy.optimize.minpack.leastsq would probably be fine if it was pure Python..

The closer your initial guess is to the solution, the faster you'll converge.
You said it was a non-linear problem. You can do a least squares solution that's linearized. Maybe you can use that solution as a first guess. A few non-linear iterations will tell you something about how good or bad an assumption that is.
Another idea would be trying another optimization algorithm. Genetic and ant colony algorithms can be a good choice if you can run them on many CPUs. They also don't require continuous derivatives, so they're nice if you have discrete, discontinuous data.

You should not use an unconstrained solver if your problem has constraints. For
instance if know that some of your variables must be nonnegative you should tell
this to your solver.
If you are happy to use Scipy, I would recommend scipy.optimize.fmin_l_bfgs_b
You can place simple bounds on your variables with L-BFGS-B.
Note that L-BFGS-B takes a general nonlinear objective function, not just
a nonlinear least-squares problem.

I agree with codehippo; I think that the best way to solve problems with constraints is to use algorithms which are specifically designed to deal with them. The L-BFGS-B algorithm should probably be a good solution in this case.
However, if using python's scipy.optimize.fmin_l_bfgs_b module is not a viable option in your case (because you are using Java), you can try using a library I have written: a Java wrapper for the original Fortran code of the L-BFGS-B algorithm. You can download it from http://www.mini.pw.edu.pl/~mkobos/programs/lbfgsb_wrapper and see if it matches your needs.

The FPL package is quite reliable but has a few quirks (array access starts at 1) due to its very literal interpretation of the old fortran code. The LM method itself is quite reliable if your function is well behaved. A simple way to force non-negative constraints is to use the square of parameters instead of the parameters directly. This can introduce spurious solutions but for simple models, these solutions are easy to screen out.
There is code available for a "constrained" LM method. Look here http://www.physics.wisc.edu/~craigm/idl/fitting.html for mpfit. There is a python (relies on Numeric unfortunately) and a C version. The LM method is around 1500 lines of code, so you might be inclined to port the C to Java. In fact, the "constrained" LM method is not much different than the method you envisioned. In mpfit, the code adjusts the step size relative to bounds on the variables. I've had good results with mpfit as well.
I don't have that much experience with BFGS, but the code is much more complex and I've never been clear on the licensing of the code.
Good luck.

I haven't actually used any of those Java libraries so take this with a grain of salt: based on the backends I would probably look at JLAPACK first. I believe LAPACK is the backend of Numpy, which is essentially the standard for doing linear algebra/mathematical manipulations in Python. At least, you definitely should use a well-optimized C or Fortran library rather than pure Java, because for large data sets these kinds of tasks can become extremely time-consuming.
For creating the initial guess, it really depends on what kind of function you're trying to fit (and what kind of data you have). Basically, just look for some relatively quick (probably O(N) or better) computation that will give an approximate value for the parameter you want. (I recently did this with a Gaussian distribution in Numpy and I estimated the mean as just average(values, weights = counts) - that is, a weighted average of the counts in the histogram, which was the true mean of the data set. It wasn't the exact center of the peak I was looking for, but it got close enough, and the algorithm went the rest of the way.)
As for keeping the constraints positive, your method seems reasonable. Since you're writing a program to do the work, maybe just make a boolean flag that lets you easily enable or disable the "force-non-negative" behavior, and run it both ways for comparison. Only if you get a large discrepancy (or if one version of the algorithm takes unreasonably long), it might be something to worry about. (And REAL mathematicians would do least-squares minimization analytically, from scratch ;-P so I think you're the one who can laugh at them.... kidding. Maybe.)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.