Why does my Gremlin traversal add only one edge? - java

As described in another question, I am attempting to add several "identity" vertices into a "group" vertex. Based on the recipe recommendation, I'm trying to write the traversal steps in such a way that the traversers iterate the identity vertices instead of appending extra steps in a loop. Here's what I have:
gts.V(group)
.addE('includes')
.to(V(identityIds as Object[]))
.count().next()
This always returns a value of 1, no matter how many IDs I pass in identityIds, and only a single edge is created. The profile indicates that only a single traverser is created for the __.V even though I'm passing multiple values:
Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[849e1878-86ad-491e-b9f9... 1 1 0.633 40.89
AddEdgeStep({label=[Includes], ~to=[[TinkerGrap... 1 1 0.915 59.11
TinkerGraphStep(vertex,[50d9bb4f-ed0d-493d-bf... 1 1 0.135
>TOTAL - - 1.548 -
Why is only a single edge added to the first vertex?

The to() syntax you are using is not quite right. A modulator like to() expects the traversal you provide it to produce a single Vertex not a list. So, given V(identityIds) only the first vertex returned from that list of ids will be used to construct the edge. Step modulators like to(), by(), etc. tend to work like that.
You would want to reverse your approach to:
gts.V(identityIds)
.addE('includes')
.from(V(group))
.count().next()
But perhaps that leads back to your other question.

Related

How to remove nodes with java API for graphstream?

I currently use the Graphstream API for Java for my project.
I wan't to delete or add Nodes on command.
With JFrame & co. I initialized a console so I can just insert
"addNode()" or "removeNode(id)" in order to get the result.
A Interface shows the nodes with a number next to them(the ID).
When I delete one node, I want all nodes with higher ID to change their ID,
but I did not figure out a way jet to change the ID of one node.
F.e. I have:
graph.addNode(0);
graph.addNode(1);
graph.addNode(2);
When deleting a Node:
graph.removeNode(0);
I want 1,2 to be changed to 0,1 without reinitializing the complete graph.
Is there a way to achieve this behaviour? I thought about something like:
graph.getNode(1).setID(0);
Unfortunately I have only access to .getID() and can't manipulate it this way.
Thanks
Nodes ids are strings and they are immutable (no renaming, no setId()).
Now what you are doing in your example is different. You are using the index-based access to the nodes. Indices are integers and correspond to arbitrary nodes in the graph, they are not associated to the ids.
When you do graph.addNode(0), the integer is converted to the string "0". Then when you do graph.removeNode(0), you are removing a node that was indexed as the first of the list of nodes. But it does have to be the node this id "0".
You can remove nodes with index (integer) 0 as long as there are nodes in the graph (graph.removeNode(0)) but you can only remove the one node with id "0" once (graph.removeNode("0")).

what data structure should be used in neo4j database to store combinations of elements

I've combinations of data to save in database. For eg: A+B+C is one combination. B+C+D is other.
Conditions:
1. A+B+C is same as B+A+C, C+B+A etc.
2. Also, each node will have an attribute called "weight". This depends on combination(In A+B+C combination, A 5g, B 6g and C 7g. Please note the third node will also have weight. Hence "weight" cannot be relationship).
Issues:
First: I have decided to go with graph database. But don't know how to meet the above conditions. If I go with undirected graph, A-B-C is a combination. But it can't return B-A-C as other. Since there is no connection from A to C.
Second: "weight" can't be a property in node, Because weight differs as per the combination. Also it can't be a relationship. Since the last node will also have weight to be considered.
Please help me on this.
Each combination can have a Combination node with WEIGHT relationships to the nodes in that combination.

is it possible to create a parallel operations inside one partition of spark?

i am new to spark and to its relevant concepts, so please be kind with me and help me to clear up my doubts, i'll give you an example to help you to understand my question.
i have one javaPairRDD "rdd" which contains tuples like
Tuple2 <Integer,String[]>
lets assume that String[].length =3, means it contains 3 elements besides the key,what i want to do is to update each element of the vector using 3 RDDs and 3 operations,"R1" and "operation1" is used to modify the first element,"R2" and "operation2" is used to modify the second element and "R3" and "operation3" is used to modify the third element,
R1, R2 and R3 are the RDDs that provide the new values of elements
i know that spark devides the data (in this example is "rdd") into many partitions, but what i am asking about : is it possible to do different operations in the same partition and at the same time?
according to my example,and because i have 3 operations, it means that i can take 3 tuples at the same time instead of taking only one to operate it:
the treatment that i want it is :(t refers the time)
at t=0:
*tuple1=use operation1 to modify the element 1
*tuple2=use operation2 to modify the element2
*tuple3=use operation3 to modify the element 3
at t=1:
*tuple1=use operation2 to modify the element 2
*tuple2=use operation3 to modify the element3
*tuple3=use operation1 to modify the element 1
at t=2:
*tuple1=use operation.3 to modify the element 3
*tuple2=use operation1 to modify the element1
*tuple3=use operation2 to modify the element 2
After finish updating the 3 first tuples, i take others (3 tuples) from the same partion to treat them, and so on..
please be kind it's just a thought that crossed my mind, and i want to know if it is possible to do it or not, thank you for your help
Spark doesn't guarantee the order of execution.
You decide how individual elements of RDD should be transformed and Spark is responsible for applying the transformation to all elements in a way that it decides is the most efficient.
Depending on how many executors (i.e. thread or servers or both) are available in your environment Spark will actually process as many tuples as possible at the same time.
First of all, welcome to the Spark community.
To add to #Tomasz BÅ‚achut answer, Spark's execution context does not identify nodes (e.g. one computing PC) as individual processing units but instead their cores. Therefore, one job may be assigned to two cores on a 22-core Xeon instead of the whole node.
Spark EC does consider nodes as computing units when it comes to their efficiency and performance, though; as this is relevant for dividing bigger jobs among nodes of varying performance or blacklisting them if they are slow or fail often.

Allocate Algorithm/Availability Algorithm

I am wondering if anyone knows of an Algorithm I could use to help me solve the following problem:
Allocate people (n) to certain events (m), m can have only one person attached to it and it must be randomized each time (Same person allowed if only one option available(n)). n has properties such as time available and day available. For n to be matched to m the time available and day available must match for both n and m. There can be multiple of n that match the times of m but it has to be the best fit so the rest of m can be allocated. The diagram below will more than likely explain it better (Sorry). n can be allocated to more than one m but should be done fairly such that one n doesnt have all of the available m's
As you can see Person A could be attached to Event A but due to the need to have them all matching (the best attempt to match) it is attached to Event B to allow Person C to be allocated to Event A and person B to Event C.
I am simply wondering if anyone knows the name of this type of problem and how I could go about solving it, I am coding the program in Java
This is a variant of the the Max Flow Problem. There are many algorithms taylor-made to solve max-flow problems, including The Ford-Fulkerson Algorithm or its refinement, the Edmonds-Karp Algorithm. Once you are able to change your problem into a max-flow problem, solving it is fairly simple. But what is the max flow problem?
The problem takes in a weighted, directed graph and asks the question "What is the maximum amount of flow that can be directed from the source (a node) to the sink (another node)?".
There are a few constraints, that make logical sense when thinking of the graph as a network of water flows.
The amount of flow through each edge must be less than or equal to the "capacity" (weight) of that edge for every edge in the graph. They also must be non-negative numbers.
The amount of flow into each node must equal the amount of flow leaving that node for ever node except the source and sink. There is no limit to the amount of flow that goes through a node.
Consider the following graph, with s as the source and t as the sink.
The solution to the max flow problem would be a total flow of 25, with the following flow amounts:
It is simple to transform your problem into a max flow problem. Assuming your inputs are:
N people, plus associated information on when person p_i is available time and date wise.
M events with a time and place.
Create a graph with the following properties:
A super source s
N person nodes p_1 ... p_n, with an edge of capacity infinity connecting s to p_i for all i in 1 ... n.
A super sink t
M event nodes e_1 ... e_m, with an edge of capacity 1 connecting e_i to t for all i in 1 ... m
For every combination of a person and event (p_i, e_j), an edge with capacity infinity connecting p_i to e_j iff p can validly attend event e (otherwise no edge connecting p_i and e_j).
Constructing a graph to these specifications has O(1) + O(N) + O(N) + O(M) + O(M) + O(1) + O(NM) = O(NM) runtime.
For your example the constructed graph would look like the following (with unlabeled edges having capacity infinity):
You correctly noticed that there is a Max Flow with value 4 for this example, and any max flow would return the same value. Once you can perform this transformation, you can run any max flow algorithm and solve your problem.
Create a class called AllocatePerson that has a Person and a list of Events as Attribute called lsInnerEvents (you have to define the class Person and the class of Events first, both with a list of Time and Day).
In the Constructor of AllocatePerson you feed a Person and a list of Events, the constructor will cycle thought the events and add to the internal list only the one that matches the parameter of the Person.
The main code will create an AllocatePerson for each Person (1 at the time) implementing the following logic:
if the newly create object "objAllocatePerson" has the lsInnerEvents list with size = 1 you remove the element contained in lsInnerEvents from the List of Events to Allocate and will fire a procedure called MaintainEvents(Events removedEvents) passing the event allocated (the one inside lsInnerEvents).
The function MaintainEvents will cycle through the current Array of AllocatePersons and remove from their lsInnerEvents the "removedEvents", if after that the size of lsInnerEvents is = 1, it will recursively invoke MaintainEvents() with the new removed events, and remove the new lsInnerEvents from the main List of Events to allocate.
At the end of the execution you will have all the association simply by cycling through the array of AllocatePersons, where lsInnerEvents size is 1
An approach that you can consider is as follows:
Create Java Objects for Persons and Events.
Place all Events in a pool (Java Collection)
Have Each Person select an Event from the pool. As Each Person can only select Events on Specific Days, Create a subset of Events that will be in the pool for selection from the Person.
Add necessary attributes to the Events to ensure that it can only be selected once by a Person

Most efficient way to check if row exists in grid Java

All,
I am wondering what's the most efficient way to check if a row already exists in a List<Set<Foo>>. A Foo object has a key/value pair(as well as other fields which aren't applicable to this question). Each Set in the List is unique.
As an example:
List[
Set<Foo>[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:3][Foo_Key:C, Foo_Value:4]
Set<Foo>[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:2][Foo_Key:C, Foo_Value:4]
Set<Foo>[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:3][Foo_Key:C, Foo_Value:3]
]
I want to be able to check if a new Set (Ex: Set[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:3][Foo_Key:C, Foo_Value:4]) exists in the List.
Each Set could contain anywhere from 1-20 Foo objects. The List can contain anywhere from 1-100,000 Sets. Foo's are not guaranteed to be in the same order in each Set (so they will have to be pre-sorted for the correct order somehow, like a TreeSet)
Idea 1: Would it make more sense to turn this into a matrix? Where each column would be the Foo_Key and each row would contain a Foo_Value?
Ex:
A B C
-----
1 3 4
1 2 4
1 3 3
And then look for a row containing the new values?
Idea 2: Would it make more sense to create a hash of each Set and then compare it to the hash of a new Set?
Is there a more efficient way I'm not thinking of?
Thanks
If you use TreeSets for your Sets can't you just do list.contains(set) since a TreeSet will handle the equals check?
Also, consider using Guava's MultSet class.Multiset
I would recommend you use a less weird data structure. As for finding stuff: Generally Hashes or Sorting + Binary Searching or Trees are the ways to go, depending on how much insertion/deletion you expect. Read a book on basic data structures and algorithms instead of trying to re-invent the wheel.
Lastly: If this is not a purely academical question, Loop through the lists, and do the comparison. Most likely, that is acceptably fast. Even 100'000 entries will take a fraction of a second, and therefore not matter in 99% of all use cases.
I like to quote Knuth: Premature optimisation is the root of all evil.

Categories