Neo4j How to create relationships after ingesting data - java

Say I have a csv file where one of the columns is unix timestamp. It is not sorted on that, but following those in order could be a useful relationship. When I want that, I could use ORDER BY, but adding relationship pointers should be faster right, and make use of the NOSQL? Do I have to sort this and add the relationship as I ingest, or can I do it after a query?
After I run the first query and get a subset back:
result = engine.execute(START...WHERE...ORDER BY time)
Can I then go through results adding this relationship like:
prev.createRelationshipTo(next, PRECEDES);
I tried two different ways using foreach or iterator and both had runtime errors casting a string to a Node:
for (Map<String, Object>row : result) {
String n = (String) row.values().iterator().next();
System.out.println(n);
}
Iterator<Node> nodes = result.columnAs("n.chosen");
if (nodes.hasNext()) {
Node prev = nodes.next();
while (nodes.hasNext()) {
Node n = nodes.next();
prev.createRelationshipTo(n, null);
prev = n;
}
}
Also, there is the edge case of two rows having the same timestamp. I don't care about the order that is chosen, but I want it to not break the relation chain.

Related

Finding the children of each and every node while traversing upto the leaves of the tree using Gremlin

Suppose i have a tree like
Now staring at vertex 8, I want to find the children of vertex 8 ie {3,10},Store them in a map and again continue the same procedure for the remaining vertices until i reach the leaf nodes(the height of the tree is unknown while writing the query).
I want to write a query which will perform the above operations and should return an Iterator containing these maps.
Please help me in writing this query.
You might consider tree step:
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> t=new Tree();g.v(1).out.tree(t).loop(2){true}
gremlin> t
==>v[1]={v[2]={}, v[3]={}, v[4]={}}
==>v[4]={v[3]={}, v[5]={}}
I didn't fully follow the expected format portion of your question, so I'm not sure if the above accomplishes exactly what you asked for. You could likely use some groovy to convert t from there as needed as it gets most of the work done out-of-the-box.
To perform the same query in GremlinPipeLine
Tree tree = new Tree();
new GremlinPipeline(graph).V().has("mgrNo", 814754).out("manager of").tree(tree).loop(2, new PipeFunction<LoopBundle,Boolean>() {
#Override
public Boolean compute(LoopBundle loopBundle) {
return true;
}
}).iterate();
Iterator it = tree.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry)it.next();
System.out.println(pair.getKey() + " = " + pair.getValue());
}
}

The efficient way to get a list of objects that are not in the database in Java

Suppose I have a list of objects (ArrayList objects) and a db table for the objects, I want to find the objects which has not been stored in the my database. The objects are identified by their "id". I can think of two solutions but I do not know which one is more efficient.
The first solution I think of is to construct one db query to get all objects existed in the db, and loop through the existed objects to determine the ones that is not in the db
ArrayList<Integer> ids = new ArrayList<Integer>();
for(MyObject o in objects){
ids.add(o.getId());
}
//I use sugar orm on Android, raw query can be seen as
// "select * from my_object where id in [ id1,id2,id3 ..... ]"
List<MyObjectRow> unwanted_objects = MyObject.find("id in (?,?,?,?,.....)",ids);
//remove the query results from the original arraylist
for(MyObjectRow o in unwanted_objects){
for(MyObject o1 in objects){
if(o1.getId() == o.getId()) objects.remove(o1);
}
}
The second solution is to query existence of every object in db, and add non-existed object to result array
ArrayList<MyObject> result_objects = new ArrayList<MyObject>();
boolean exist = false
for(MyObject o in objects){
exist = MyObject.find("EXIST( select 1 from my_object where id = ?)", o.getId());
if(!exist){
result_objects.add(o);
}
}
The first solution only require one query, but when loop through all founded objects, the complexity become O(n*n)
The second solution constructs n db querys, but it only has a complexity of O(n)
Which one may be better in terms of performance?
I would use option 1 with a change to use a Map<Integer, MyObject> to improve the performance of the removal of query results from the original list:
List<Integer> ids = new ArrayList<Integer>();
Map<Integer, MyObject> mapToInsert = new HashMap<Integer, MyObject>();
for(MyObject o in objects) {
//add the ids of the objects to possibly insert
ids.add(o.getId());
//using the id of the object as the key in the map
mapToInsert.put(o.getId(), o);
}
//retrieve the ids of the elements that already exist in database
List<MyObjectRow> unwanted_objects = MyObject.find("id in (?,?,?,?,.....)",ids);
//remove the query results from the map, not the list
for(MyObjectRow o in unwanted_objects){
mapToInsert.remove(o.getId());
}
//insert the values that still exist in mapToInsert
Collection<MyObject> valuesToInsert = mapToInsert.values();
You don't know the efficiency of the database operations. If the database is a b-tree under the hood that query could take O(log n). If your indices aren't set up correctly, you may be looking at o(n) performance for that query. Your measurement of efficiency here is also ignoring any transaction costs: the cost to initiation a connection with, process the query, and close the connection to the database. This is a 'fixed' cost, and I wouldn't want to do that in a loop if i didn't have to.
Go with the first solution.

Create recursive structure from flatten DFS structure

Problem
I have the following tree:
2
/ \
3 5
/ / \
6 4 1
that is represented in the following way and order:
id parent
------------
2 null
3 2
6 3
5 2
4 5
1 5
Purpose:
Store this flatten tree in a recursive structure in O(n) (O(n*log(n)) is acceptable, but not very good) (I know how to solve it in O(n^2), but I stored data in that DFS order to be able to "parse" it in a more efficient way). E.g.:
class R {
int id;
List<R> children;
}
that looks like this in a JSON form:
{
id: 2,
children: [
{
id: 3,
children: { ... }
},
{
id: 5,
children: { ... }
}
]
}
How can I do this? The programming language is not important, because I can translate it in Java.
Java code:
R r = new R();
Map<Long, Line> map = createMap2();
List<Line> vals = new ArrayList<Line>(map.values());
r.id = vals.get(0).id;
vals.remove(0);
r.children = createResource(vals, r.id);
...
private static List<R> createResource(List<Line> l, Long pid) {
List<R> lr = new ArrayList<R>();
if ( l.size() > 0 ) {
Long id = l.get(0).id;
Long p = l.get(0).pid;
l.remove(0);
if ( pid.equals(p) ) {
R r = new R();
r.id = id;
r.children = createResource(l, id);
lr.add(r);
}
else {
return createResource(l, pid); // of course, this is not ok
}
}
return lr;
}
The problem in the code above is that only 2, 3 and 6 are stored in the recursive structure (class R). I want to store the whole flatten tree structure (many Line objects) in that recursive structure (R object), not only some of nodes.
P.S.: The problem is simplified. I'm not interested in a specific solutions because there are many fields involved and thousands of entries. I am also interested in solutions that work fine for the worst case scenarios (different kind of trees) because this is the user's guarantee.
What about something like this? In the first pass, hash the parents as arrays of their children and identify the root; in the second, begin with the root and for each of its children, insert a new object, with its own children and so on:
To take your example, the first pass would generate
parent_hash = {2:[3,5], 3:[6], 5:[4,1]}
root = 2
The second pass would go something like:
object 2 -> object 3 -> object 6
-> object 5 -> object 4
-> object 1
done
The problem with your code is that once an entry doesn't satisfy p == pid condition it is lost forever. Instead of losing entries you should break the loop and return immediately. The offending entry shall also be returned and handled by a proper instance of R upstream.
You can easily represent the whole tree in an array, since each node of the tree can be represented by an index in the array. For a binary tree, the children of index i would be at index 2*i+1 and index 2*i+2. It would then be simple to convert the array to any other representation. The array itself would be a space-efficient representation for balanced trees, but would waste a lot of space for very unbalanced trees. (This should not matter unless you're dealing with a very large amount of data.)
If you need a memory-efficient way for large unbalanced trees, then it would make sense to use the standard Node-representation of trees. To convert from your list, you could use a HashMap as גלעד ברקן suggested. However, if the id's of the nodes are mostly continuous (like the example where they're from 1 to 6), you could also just use an array where each index of the array i is used to store a Node with an id of i. This will let you easily find a parent node and assign it children nodes as they're created.
(cf. my Trees tutorial for storing trees as arrays.)
I found a simple solution based on that "DFS" order.
This approach works even if I use a list of "Line" objects or a map.
private static List<R> createResource(List<Line> l, Long pid) {
List<R> lr = new ArrayList<R>();
for ( Line line : l ) {
if ( line is a children ) {
R r = new R();
r.id = id;
l.remove(0);
r.children = createResource(l, line.id);
lr.add(r);
}
}
return lr;
}
It seems to be in O(n^2) because there is a for loop + recursion for all elements, but it is in O(n) . Thanks to DFS order, the next element for which createResource is called is on the first position ( 0 -> O(1) ). Because the recursion takes every element => the complexity is O(n).
But if the order is not the DFS order (maybe a Map that is not LinkedHashMap is involved), I recommend the solution that contains arrays for parents. (according to גלעד ברקן )

GAE, JDO: Add, Move & Delete entities of ordered list

Hi i have a parent entity say A which has list of child entities say List<B> children.
I need the order of child entities to be maintained since its important for my application.
The way i have done it is using:
https://developers.google.com/appengine/docs/java/datastore/jdo/relationships#Owned_One_to_Many_Relationships
How Ordered Collections Maintain Their Order.
#Persistent
#Element(dependent = "true")
#Order(extensions = #Extension(vendorName="datanucleus", key="list-ordering", value="index ASC"))
private List objects;
Now i add to list using:
newObj.setIndex(0);
for (int i = 0; i < objList.size(); i++) {
objList.get(i).setIndex(i + 1);
}
objList.add(newObj);
Move using:
if (direction.equalsIgnoreCase("up")) {
objList.get(index).setIndex(index - 1);
objList.get(index - 1).setIndex(index);
}
else if (direction.equalsIgnoreCase("down")) {
objList.get(index).setIndex(index + 1);
objList.get(index + 1).setIndex(index);
}
And delete using:
for (int i = index + 1; i < objList.size(); i++) {
objList.get(i).setIndex(i - 1);
}
objList.remove(index);
Is this the right way to do it? Add & Move seem to work. But Delete behaves weirdly. Random objs get deleted and the list is in completely inconsistent state!
GAE: 1.7.2
DataNucleus Enhancer (version 3.1.0.m2)
Remove at an index only makes real sense for an indexed list (i.e the standard JDO List) and you're not using that. When you call that with DataNucleus and an RDBMS an exception is thrown. Obviously GAE didn't get around to such niceties, but then logic would suggest it. Removing something from an ordered list really ought to call remove(Object).
Moving objects around using setting of this index column may work ... at the next time they are read in; the only thing that the "ordering" clause does is order things at the point they are read in.
try to call objList.remove() before adjusting all the indexes.

Sorting of 2 or more massive resultsets?

I need to be able to sort multiple intermediate result sets and enter them to a file in sorted order. Sort is based on a single column/key value. Each result set record will be list of values (like a record in a table)
The intermediate result sets are got by querying entirely different databases.
The intermediate result sets are already sorted based on some key(or column). They need to be combined and sorted again on the same key(or column) before writing it to a file.
Since these result sets can be massive(order of MBs) this cannot be done in memory.
My Solution broadly :
To use a hash and a random access file . Since the result sets are already sorted, when retrieving the result sets , I will store the sorted column values as keys in a hashmap.The value in the hashmap will be a address in the random access file where every record associated with that column value will be stored.
Any ideas ?
Have a pointer into every set, initially pointing to the first entry
Then choose the next result from the set, that offers the lowest entry
Write this entry to the file and increment the corresponding pointer
This approach has basically no overhead and time is O(n). (it's Merge-Sort, btw)
Edit
To clarify: It's the merge part of merge sort.
If you've got 2 pre-sorted result sets, you should be able to iterate them concurrently while writing the output file. You just need to compare the current row in each set:
Simple example (not ready for copy-and-paste use!):
ResultSet a,b;
//fetch a and b
a.first();
b.first();
while (!a.isAfterLast() || !b.isAfterLast()) {
Integer valueA = null;
Integer valueB = null;
if (a.isAfterLast()) {
writeToFile(b);
b.next();
}
else if (b.isAfterLast()) {
writeToFile(a);
a.next();
} else {
int valueA = a.getInt("SORT_PROPERTY");
int valueB = b.getInt("SORT_PROPERTY");
if (valueA < valueB) {
writeToFile(a);
a.next();
} else {
writeToFile(b);
b.next();
}
}
}
Sounds like you are looking for an implementation of the Balance Line algorithm.

Categories