Extract subgraph in neo4j using cypher query - java

I'm using neo4j 3.1 with java 8 and I want to extract a connected subgraph as to store it as a test database.
Is it possible to do it and how?
How to do it with the clause Return which returns the output. So, I had to create new nodes and relations or just export the subgraph and put it in a new database.
How can I extract a connected subgraph since I have a disconnected graph.
Thank you

There are two parts to this...getting the connected subgraph, and then finding a means to export.
APOC Procedures seems like it can cover both of these. The approach in this answer using the path expander should get you all the nodes in the connected subgraph (if the relationship type doesn't matter, leave off the relationshipFilter parameter).
The next step is to get all relationships between all of those nodes. APOC's apoc.algo.cover() function in the graph algorithms section should accomplish this.
Something like this (assuming this is after the subgraph query, and subgraphNode is in scope for the column of distinct subgraph nodes):
...
WITH COLLECT(subgraphNode) as subgraph, COLLECT(id(subgraphNode)) as ids
CALL apoc.algo.cover(ids) YIELD rel
WITH subgraph, COLLECT(rel) as rels
...
Now that you have the collections of both the nodes and relationships in the subgraph, you can export them.
APOC Procedures offers several means of exporting, from CSV to CypherScript. You should be able to find an option that works for you.

You can also use the neo4j-shell to extract the result of a query to a file and use this same file to re-import it in the neo4j database :
ikwattro#graphaware-team ~/d/_/310> ./bin/neo4j-shell -c 'dump MATCH (n:Product)-[r*2]->(x) RETURN n, r, x;' > result.cypher
check the file
ikwattro#graphaware-team ~/d/_/310> cat result.cypher
begin
commit
begin
create (_1:`Product` {`id`:"product123"})
create (_2:`ProductInformation` {`id`:"product123EXCEL"})
create (_3:`ProductInformationElement` {`id`:"product123EXCELtitle", `key`:"title", `value`:"Original Title"})
create (_5:`ProductInformationElement` {`id`:"product123EXCELproduct_type", `key`:"product_type", `value`:"casual_bag"})
create (_1)-[:`PRODUCT_INFORMATION`]->(_2)
create (_2)-[:`INFORMATION_ELEMENT`]->(_3)
create (_2)-[:`INFORMATION_ELEMENT`]->(_5)
;
commit
Use this file for feeding another neo4j :
ikwattro#graphaware-team ~/d/_/310> ./bin/neo4j-shell -file result.cypher
Transaction started
Transaction committed
Transaction started
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 4
Relationships created: 3
Properties set: 8
Labels added: 4
52 ms
Transaction committed

Related

Spark join/groupby datasets take a lot time

I have 2 datasets(tables) with 35kk+ rows.
I try to join(or group by) this datasets by some id. (in common it will be one-to-one)
But this operation takes a lot time: 25+ h.
Filters only works fine: ~20 mins.
Env: emr-5.3.1
Hadoop distribution:Amazon
Applications:Ganglia 3.7.2, Spark 2.1.0, Zeppelin 0.6.2
Instance type: m3.xlarge
Code (groupBy):
Dataset<Row> dataset = ...
...
.groupBy("id")
.agg(functions.min("date"))
.withColumnRenamed("min(date)", "minDate")
Code (join):
...
.join(dataset2, dataset.col("id").equalTo(dataset2.col("id")))
Also I found this message in EMR logs:
HashAggregateExec: spark.sql.codegen.aggregate.map.twolevel.enable is set to true, but current version of codegened fast hashmap does not support this aggregate.
There Might be a possibility of Data getting Skewed. We faced this. Check your joining column. This happens mostly if your joining column has NULLS
Check Data Stored pattern with :
select joining_column, count(joining_col) from <tablename>
group by joining_col
This will give you an idea that whether the data in your joining column is Evenly distributed

Neo4j Bulk Data - Create Relationship [OutOfMemory Exception]

I am using Neo4j Procedure to create relationships on bulk data.
Initially insert that all data using load csv.
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///XXXX.csv" AS row
....
data size is too large[10M] but its successfully executed
my problem is i want to create relationships between this all nodes many-many
but i got exception [OutMemoryException] while executing queries
MATCH(n1:x{REMARKS :"LATEST"}) MATCH(n2:x{REMARKS :"LATEST"}) WHERE n1.DIST_ID=n2.ENROLLER_ID CREATE (n1)-[:ENROLLER]->(n2) ;
I have already created Indexing and Constraints also
Any idea please help me?
The problem is that your query is performed in one transaction, which leads to the exception [OutMemoryException]. And this is a problem, since at this moment the possibility of periodic transactions only have to load the CSV. So, you can, for example, re-read the CSV after first load:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///XXXX.csv" AS row
MATCH (n1:x{REMARKS :"LATEST", DIST_ID: row.DIST_ID})
WITH n1
MATCH(n2:x{REMARKS :"LATEST"}) WHERE n1.DIST_ID=n2.ENROLLER_ID
CREATE (n1)-[:ENROLLER]->(n2) ;
Or try the trick with periodic committing from the APOC library:
call apoc.periodic.commit("
MATCH (n2:x {REMARKS:'Latest'}) WHERE exists(n2.ENROLLER_ID)
WITH n2 LIMIT {perCommit}
OPTIONAL MATCH (n1:x {REMARKS:'Latest'}) WHERE n1.DIST_ID = n2.ENROLLER_ID
WITH n2, collect(n1) as n1s
FOREACH(n1 in n1s|
CREATE (n1)-[:ENROLLER]->(n2)
)
REMOVE n2.ENROLLER_ID
RETURN count(n2)",
{perCommit: 1000}
)
P.S. ENROLLER_ID property is used as a flag for selecting nodes for processing. Of course, you can use another flag, which is set in the processing.
Or a more accurate with apoc.periodic.iterate:
CALL apoc.periodic.iterate("
MATCH (n1:x {REMARKS:'Latest'})
MATCH (n2:x {REMARKS:'Latest'}) WHERE n1.DIST_ID = n2.ENROLLER_ID
RETURN n1,n2
","
WITH {n1} as n1, {n2} as n2
MERGE (n1)-[:ENROLLER]->(n2)
", {batchSize:10000, parallel:true}
)

How do I query OrientDB Vertex graph object by Record ID in Java?

How do I retrieve an OrientDB Document/Object or Graph object using its Record ID? (Language: Java)
I'm referring to http://orientdb.com/docs/2.0/orientdb.wiki/Tutorial-Record-ID.html and Vertex.getId() / Edge.getId() methods.
It is just like an SQL query "SELECT * from aTable WHERE ID = 1".
Usage/purpose description: I want to store the generated ID after it is created by OrientDB, and later retrieve the same object using the same ID.
(1) I'd suggest using OrientDB 2.1, and its documentation, e.g. http://orientdb.com/docs/2.1/Tutorial-Record-ID.html
(2) From your post, it's unclear to me whether you need help obtaining the RID from the results of a query, or retrieving an object given its RID, so let me begin by mentioning that the former can be accomplished as illustrated by this example (in the case of an INSERT query):
ODocument result=db.command(new OCommandSQL(<INSERTQUERY>)).execute();
System.out.println(result.field("#rid"));
Going the other way around, there are several approaches. I have verified that the following does work using Version 2.1.8:
OrientGraph graph = new OrientGraph("plocal:PATH_TO_DB", "admin", "admin");
Vertex v = graph.getVertex("#16:0");
An alternative and more generic approach is to construct and execute a SELECT query of the form SELECT FROM :RID, along the lines of this example:
List<ODocument> results = db.query(new OSQLSynchQuery<ODocument>("select from " + rid));
for (ODocument aDoc : results) {
System.out.println(aDoc.field("name"));
}
(3) In practice, it will usually be better to use some other "handle" on OrientDB vertices and edges in Java code, or indeed when using any of the supported programming languages. For example, once one has a vertex as a Java Vertex, as in the "Vertex v" example above, one can usually use it.

How to apply aggregate functions(like MIN, MAX, COUNT) in JCR-SQL2?

I have some records stored as Nodes in JCR and the name of the node is the primary key. eg 1,2,3.
But the problem starts here,
the records are as follows 1,2,6,53,54
Where the numbers above are nodes under EMP unstructured node.
If I do
int count=empNode.getNodeIterator().getSize() I will get 5 As there are 5 nodes
So I do count++ which gives me 6 but 6 already exists, so I can't create a node with name 6 under EMP[nt:unstructred], thats why I want to apply MAX(nodeNames) something in the query. What should I do ?
Update ::
I use CQ5.5. EMP is an unstructered node under content like /content/EMP.
Under this(EMP) I have unstructered nodes that hold my data. And these node have names as 1,2, etc
I tried with my CQ5.4 instance to find a soloution. Unfortunatly my tries were not successful. When I used the keywords 'sql2 count' with Google, I found this page. There was asked the same question and the answer was
There is no count(*) or group by selector in JCR SQL 1, XPath [2] or
JCR-SQL2/AQM [3].
To implement such a tag cloud, you can run one query that fetches all
your content containing the relevant "tag" property:
//element(*, my:Article)[#tag]
and then iterate over the result and count your tags on the
application side by looking at the tag property values and using some
hashmap (tagid -> count).
http://www.day.com/specs/jcr/1.0/ (section 8.5)
http://www.day.com/specs/jcr/1.0/ (section 6.6)
http://www.day.com/specs/jcr/2.0/6_Query.html
I think you can connect this answer to MAX() and MIN().
I implemented a simple Apache Sling servlet to implement the count(*) function. More information here: https://github.com/artika4biz/sling-utils.
Official documentation can be found here: https://jackrabbit.apache.org/oak/docs/query/query-engine.html

How to pass neo4j Nodes from one database to another

I have two separate Database for Neo4j nodes. How can i pass Nodes from one database to another??
LIKE
1. Machine1 - GraphDB1- (Nodes-Students)
2.Machine2 - GraphDB2- (Nodes-Books)
so how can i pass book nodes to GraphDB1.
Any help would be appreciated.
You wouldn't do that, you would create all your data in one database.
In general you can query one databases with Cypher and then create / insert the data in the second database.
On the first database, return a node and relationship-list:
start n=node(*)
match n-[r]->()
return n,r
Us a programming language to create a CSV file or a set of cypher CREATE statements from those results. For importing CSV see: http://neo4j.org/develop/import esp. the "spreadsheet method" and/or the CSV batch-importer.
Enable auto-indexing in your second server: http://docs.neo4j.org/chunked/milestone/auto-indexing.html
Cypher Create statements for nodes and relationships look like this:
CREATE ({name:"Foo", age: 12});
CREATE ({name:"Bar", age: 18});
START n=node:node_auto_index(name="Foo"),
m=node:node_auto_index(name="Bar")
CREATE n-[:KNOWS {since:2012}]->m;
You can also check out my Neo4j-Import tools for the Neo4j-Shell: https://github.com/jexp/neo4j-shell-tools

Categories