Filtering Gremlin search by parent Vertex - java

I'm very new to Gremlin. I have been going through the documentation but continue to struggle to find an answer to my problem. I'm assuming the answer is easy, but have unfortunately become a little confused with all the different API options e.g. subgraphs, side effects and would like a little help/clarity from the expert group if possible.
Basically (as an example) I have a graph that looks like the below where I first need to select 'A' and then traverse down the children of 'A' only, to find if there is Vertex that matches 'A3' or 'A4'.
Selecting the first Vertex of course is easy, I simply do something like:
.V().has("name", "A")
However, I'm not sure how I can now isolate my second vertex search to the children of 'A' only. As I mentioned before I have stumbled upon subgraphs but have not being able to fully grasp how I can leverage this capability or if I should for my purpose.
I'm using TinkerPop3 and Java 8.
Any help will be greatly appreciated!

When you start your traversal with: g.V().has('name','A') you get the "A" vertex. Any additional steps that you add after that are restricted to that one vertex. Therefore g.V().has('name','A').out() can only ever give you the "A1" vertex and related children.
To traverse through all the children of "A", you need repeat() step:
g.V().has('name','A').
repeat(out()).
until(has('name',within('A3','A4'))
So, basically find "A", then traverse through children until you run into "A3" or "A4".
In the future, please consider supplying a Gremlin script that can be pasted into the console to construct your example graph - here's an example. An example graph in that form is quite helpful.

Related

Compiling responses of gremlin traversal query

I have the following graph structure:
Vertexes -- Campaign, Project, Lead
{"Name":["CompanyV"],"sid":["SidFromSQL"]}
{"name":["Campaign3V"],"status":["paused"]}
{"name":["Campaign1"],"startDate":["Jan 1, 2019 5:30:00 AM"]}
{"name":["Campaign2V"],"status":["active"]}
{"name":["Lead11V"]}
{"name":["Lead2V"]}
{"name":["Project1V"],"Name":[""],"sid":["SidFromSQL"]}
{"name":["Project2V"],"Name":[""],"sid":["SidFromSQL"]}
{"name":["Lead3V"]}
{"name":["Campaign1V"],"status":["active"]}
Edges:
{"inVertex":{"id":"58b6e79f-6809-6fc4-9f0a-c8a26337a729","label":"Campaign"},"outVertex":{"id":"1cb6e79d-1ca7-4d3c-e71d-c05c13abac15","label":"Project"},"id":"a6b6e79f-6809-87bf-a535-eec9101e683c","label":"hasCampaign"}
{"inVertex":{"id":"c4b6e7ae-64d3-b8b9-ce7b-c319e7ed70ca","label":"Lead"},"outVertex":{"id":"1cb6e79d-1ca7-4d3c-e71d-c05c13abac15","label":"Project"},"id":"6cb6e7ae-64d4-6fb0-9314-411eb72d9a28","label":"hasLead"}
{"inVertex":{"id":"a2b6e79d-1ca7-db4e-f19f-2ef1df514ade","label":"Project"},"outVertex":{"id":"64b6e79c-d58b-37ad-6c3e-63a783e6df97","label":"Company"},"id":"4eb6e7a8-451f-4365-1d79-d5d118b5ff56","label":"hasProject"}
{"inVertex":{"id":"c4b6e7ae-64d3-b8b9-ce7b-c319e7ed70ca","label":"Lead"},"outVertex":{"id":"58b6e79f-6809-6fc4-9f0a-c8a26337a729","label":"Campaign"},"id":"96b6e7ae-64d4-b918-353d-fccc13cbd9bb","label":"hasLead"}
{"inVertex":{"id":"94b6e79f-69b9-ccfe-d9e6-a41c4be59979","label":"Campaign"},"outVertex":{"id":"a2b6e79d-1ca7-db4e-f19f-2ef1df514ade","label":"Project"},"id":"34b6e79f-69b9-bd15-9331-0551c464f222","label":"hasCampaign"}
{"inVertex":{"id":"36b6e7b2-3d78-3229-9ebd-05c2c5f5927b","label":"Lead"},"outVertex":{"id":"1cb6e79d-1ca7-4d3c-e71d-c05c13abac15","label":"Project"},"id":"c2b6e7b2-3d78-16d0-ee95-46b66108236e","label":"hasLead"}
{"inVertex":{"id":"36b6e7b2-3d78-3229-9ebd-05c2c5f5927b","label":"Lead"},"outVertex":{"id":"58b6e79f-6809-6fc4-9f0a-c8a26337a729","label":"Campaign"},"id":"04b6e7b2-3d79-3d95-855e-a206d38b8603","label":"hasLead"}
{"inVertex":{"id":"1cb6e79d-1ca7-4d3c-e71d-c05c13abac15","label":"Project"},"outVertex":{"id":"64b6e79c-d58b-37ad-6c3e-63a783e6df97","label":"Company"},"id":"ccb6e7a8-449b-d92c-1330-4f6288ab0852","label":"hasProject"}
{"inVertex":{"id":"7eb6e7dc-94f9-ca83-df4c-87284897151f","label":"Lead"},"outVertex":{"id":"1cb6e79d-1ca7-4d3c-e71d-c05c13abac15","label":"Project"},"id":"d2b6e7dc-94fb-8219-30a2-d304ccaed75d","label":"hasLead"}
{"inVertex":{"id":"d0b6e79f-692a-8c03-112e-9388e54b1f9d","label":"Campaign"},"outVertex":{"id":"1cb6e79d-1ca7-4d3c-e71d-c05c13abac15","label":"Project"},"id":"3cb6e79f-692b-4cad-48ac-1e4991e75b60","label":"hasCampaign"}
I am running the below query to fetch the Leads associated with a Project and a specific campaign.
GraphTraversal t =g.V("1cb6e79d-1ca7-4d3c-e71d-c05c13abac15").out("hasLead")
.where(in("hasLead").has("Campaign","name","Campaign1V"));
This is returning the information about the Leads in the output, I wanted to know if there is a way in which i can get the specific Campaign information as well as the ID information in the output response (using a single traversal statement) so that this can be utilised by the UI component to render in an HTML.
You just need to transform your result into the form that you want. In this case you can use something like project()
g.V("1cb6e79d-1ca7-4d3c-e71d-c05c13abac15").out("hasLead").
where(__.in("hasLead").has("Campaign","name","Campaign1V")).
project('lead','campaign').
by().
by(__.in("hasLead").has("Campaign","name","Campaign1V").fold())
You might want to include something in that first by() modulator to further transform the vertex to the properties you want and you might want to do the same for the second as well. Furthermore, the fold() is only necessary if you have more than one campaign per lead.
So, the above works nicely and is easy to follow, but it does traverse the same "hasLead" edges twice. You can avoid that, but it adds a bit of misdirection to readability that you have to decide if you can live with:
g.V("1cb6e79d-1ca7-4d3c-e71d-c05c13abac15").out("hasLead").
project('lead','campaign').
by().
by(__.in("hasLead").has("Campaign","name","Campaign1V").fold()).
filter(select('campaign').unfold())
Now you project all of the "leads" but filter away any that have an empty list for the "campaign".

Graph algorithm to find the most likely ancestor of a node

I'm working on the Wikipedia Category Graph (WCG). In the WCG, each article is associated to multiple categories.
For example, the article "Lists_of_Israeli_footballers" is linked to multiple categories, such as :
Lists of association football players by nationality - Israeli footballers - Association football in Israel lists
Now, if you climb back the category tree, you are likely to find a lot of paths climbing up to the "Football" category, but there is also at least one path leading up to "Science" for example.
This is problematic because my final goal is to be able to determinate whether or not an article belongs to a given Category using the list of categories it's linked with : right now a simple ancestor search gives false positives (for example : identifies "Israeli footballers" as part of the "Science" category - which is obviously not the expected result).
I want an algorithm able to find out what the most likely ancestor is.
I thought about two main solutions :
Count the number of distinct paths in the WCG linking article's category vertices to the candidate ancestor category (and use number of paths linking to other categories of same depth for comparison)
Use some kind of clustering algorithm and make ancestor search queries in isolated graph spaces
The issue with those options is that they seem to be very costly considering the size of the WCG (2 million vertices - even more edges). Eventually, I could work with a solution that uses a preprocessing algorithm in O(n) or more to achieve O(1) later, but I need the queries to be overall very fast.
Are there existing solutions to my problem ? Open to all suggestions.
Np, thanks for clarifying. anything like clustering is probably not a good idea, because those type of algorithms are meant to determine a category for an object that is not associated with a category yet. In your problem all objects (footballer article) is already associated to different categories.
You should probably do a complete search through all articles and save the matched categories with each article in a hash table so that you can then retrieve this category information when you need to know this for a new article.
Whether or not a category is relevant for an article seems totally arbitrary to me and seems to be something you should decide for yourself (e.g. determine a threshhold of 5 links to a category before it is determined part of the category).
If you're getting these articles from wikipedia you're probably going to have a pretty long run working through the entire tree, but in my opinion it seems like it's your only choice.
Search with DFS, and each time you find an arcticle-category match save the article in a hashtable (you need to be able to reduce an article to a unique identifier).
This is probably my most vague answer I've ever posted here, and your question might be too broad... if you're not helped with this please let me know so I can consider removing it in order to avoid confusion with future readers.

Roommate matching algorithm

I'm currently in an advanced data structure class and learned a good bit about the graph. For this summer, I was asked to help write an algorithm to match roommates. Now for my data structure class, I've written a City Path graph and performs some sorting and prims algorithms and I'm sort of thinking that a graph may be a great place to start with my roommate matching algorithm.
I was thinking that our data base could just be a text file, nothing too fancy. However I could initialize each nodes in the graph as a student each student would have an un-directed edge to many more students (no edge to the student who doesn't want to be roommate with another one, the sorority also doesn't want repeating roommate). Now I could also make the edge weights more, depending on the special interest.
Everything listed above is quite simple and I don't think I'll run into any problem implementing it. But here is my question:
How should I update the common interest field? Should I start that with a physical survey and then go back into the text file and update the weight of the edge manually? Or should I be creating a field that keeps track of the matching interests?
What you're trying to design is called bipartite matching. Fortunately unlike other bipartite matching algorithms, you won't need fancy graph algorithms and complex implementation for this. This is very close of Stable Marriage Problem and surprisingly there are very effective even easier algorithm for this.
If you are interested, I can share my C++ implementation of stable marriage problem.

Draw nodes in e.g. a Chord ring

I have a set of nodes that I would like to put into a ring. They all have a numeric property which I would like to use a reference when putting into a ring.
E.g, node with param 32 comes after node with para 22.
What I really need is a library (or something like that) which can make it possible to have the correct "distance" between the nodes, e.g: between 22 and 32 is 10 "units", and between 32 and 35 is 3 "units" where "units" may be an empty numeric slot.
Sounds like you need a sorted list where the end links to the start. I know of no standard implementation, but it would be pretty easy to implement one yourself.
Something like a doubly linked list with the head and tail connected would work. Add operations would have to traverse the list to find the appropriate position to insert into, making insert an O(n) operation. This would make your list perform realtivly poorly, with pretty much all standard list operations being O(n).
You could implement a distanceToNext and/or distanceToPrevious pretty easily by just getting the values of the current and next/previous nodes and returning the difference.
Edit:
Just realised from the question title that you are probably looking for some GUI library to draw these and I just hinted at the model you might use. I'll have a think about the GUI.
Edit 2:
Your problem boils down to how do you draw a polygon when you only know the length of the sides. I asked on the maths stack exchange for you.

Why does A* path finding sometimes go in straight lines and sometimes diagonals? (Java)

I'm in the process of developing a simple 2d grid based sim game, and have fully functional path finding.
I used the answer found in my previous question as my basis for implementing A* path finding. (Pathfinding 2D Java game?).
To show you really what I'm asking, I need to show you this video screen capture that I made.
I was just testing to see how the person would move to a location and back again, and this was the result...
http://www.screenjelly.com/watch/Bd7d7pObyFo
Different choice of path depending on the direction, an unexpected result. Any ideas?
If you're looking for a simple-ish solution, may I suggest a bit of randomization?
What I mean is this: in the cokeandcode code example, there is the nested-for-loops that generate the "successor states" (to use the AI term). I refer to the point where it loops over the 3x3 square around the "current" state, adding new locations on the pile to consider.
A relatively simple fix would (should :)) be isolate that code a bit, and have it, say, generated a linkedlist of nodes before the rest of the processing step. Then Containers.Shuffle (or is it Generics.Shuffle?) that linked list, and continue the processing there. Basically, have a routine say,
"createNaiveNeighbors(node)"
that returns a LinkedList = {(node.x-1,node.y), (node.x, node.y-1)... } (please pardon the pidgin Java, I'm trying (and always failing) to be brief.
Once you build the linked list, however, you should just be able to do a "for (Node n : myNewLinkedList)" instead of the
for (int x=-1;x<2;x++) {
for (int y=-1;y<2;y++) {
And still use the exact same body code!
What this would do, ideally, is sort of "shake up" the order of nodes considered, and create paths closer to the diagonal, but without having to change the heuristic. The paths will still be the most efficient, but usually closer to the diagonal.
The downside is, of course, if you go from A to B multiple times, a different path may be taken. If that is unnacceptable, you may need to consider a more drastic modification.
Hope this helps!
-Agor
Both of the paths are of the same length, so the algorithm is doing its job just fine - it's finding a shortest path. However the A* algorithm doesn't specify WHICH shortest path it will take. Implementations normally take the "first" shortest path. Without seeing yours, it's impossible to know exactly why, but if you want the same results each time you're going to have to add priority rules of some sort (so that you're desired path comes up first in the search).
The reason why is actually pretty simple: the path will always try to have the lowest heuristic possible because it searches in a greedy manner. Going closer to the goal is an optimal path.
If you allowed diagonal movement, this wouldn't happen.
The reason is the path you want the algorithm to go.
I don't know the heuristic your A* uses but in the first case it has to go to the end of the tunnel first and then plans the way from the end of the tunnel to the target.
In the second case the simplest moves to the targets are going down till it hits the wall and then it plans the way from the wall to the target.
Most A* I know work with a line of sight heuristic or a Manhattan Distance in the case of a block world. This heuristics give you the shortest way but in case of obstacles that force to go a way that is different from the line of sight the ways depend on your starting point.
The algorithm will go the line of sight as long as possible.
The most likely answer is that going straight south gets it closest to its goal first; going the opposite way, this is not a choice, so it optimizes the sub-path piecewise with the result that alternating up/across moves are seen as best.
If you want it to go along the diagonal going back, you are going to have to identify some points of interest along the path (for example the mouth of the tunnel) and take those into account in your heuristic. Alternatively, you could take them into account in your algorithm by re-computing any sub-path that passes through a point of interest.
Back in the day they used to do a pre-compiled static analysis of maps and placed pathfinding markers at chokepoints. Depending on what your final target is, that might be a good idea here as well.
If you're really interested in learning what's going on, I'd suggest rendering the steps of the A* search. Given your question, it might be very eye-opening for you.
In each case it's preferring the path that takes it closer to its goal node sooner, which is what A* is designed for.
If I saw right, the sphere is moving first to the right in a straigt line, because it cannot got directly toward the goal (path is blocked).
Then, it goes in a straight line toward the goal. It only looks diagonal.
Does your search look in the 'down' direction first? This might explain the algorithm. Try changing it to look 'up' first and I bet you would see the opposite behavior.
Depending on the implementation of your astar you will see different results with the same heuristic, as many people have mentioned. This is because of ties, when two or more paths tie the way you order your open set will determine the way the final path will look. You will always get the optimal path if you have an admissible heuristic, but the nodes visited will increase with the number of ties you have(relative to a heuristic producing not as many ties).
If you dont think visiting more nodes is a problem i would suggest using the randomization (which is your current accepted answer) suggestion. If you think searching more nodes is a problem and want to optimize i would suggest using some sort of tiebreaker. It seems you are using manhattan distance, if you use euclidian distance when two nodes tie as a tiebreaker you will get more straight paths to the goal and you will visit fewer nodes. This is ofcourse given no traps or block of line of sight to the goal.
To avoid visiting nodes with blocking elements in the line of sight path i would suggest finding a heuristic which takes into account these blocking elements. Ofcourse a new heuristic shouldnt do more work than a normal A star search would do.
I would suggest looking at my question as it might produce some ideas and solutions to this problem.

Categories