Why is a simple Cypher query so slow? - java

Using Neo4j 2.3.0 Community Edition with Oracle JDK 8 and Windows 7
I am new to Neo4j and just trying how it works with Java. In the Neo4j Browser I created 3 nodes with the following statement:
CREATE (c:Customer {name:'King'})-[:CREATES]->(:Order {status:'created'}),
(c)-[:CREATES]->(:Order {status:'created'})
Executed from the Neo4j Browser, the following query returns in 200 ms:
MATCH (c:Customer)-[:CREATES]->(o:Order)
WHERE c.name = 'King'
RETURN o.status
Executing this in Eclipse takes about 2500 ms, sometimes up to 3000 ms:
String query = "MATCH (c:Customer)-[:CREATES]->(o:Order) "
+ "WHERE c.name = 'King' "
+ "RETURN o.status";
Result result = db.execute(query);
This is incredibly slow! What am I doing wrong?
In addition, I ran the following snippet in Eclipse and it only took about 50 ms:
Node king = db.findNode(NodeType.Customer, "name", "King");
Iterable<Relationship> kingRels = king.getRelationships(RelType.CREATES);
for(Relationship rel : kingRels) {
System.out.println(rel.getEndNode().getProperty("status"));
}
So there are actually two things I am suprised of:
Running a Cypher query in the Neo4j Browser seems to be way slower than doing a comparable thing with the Neo4j Core Java API in Eclipse.
Running a Cypher query "embedded" in Java code is incredibly slow compared to the Neo4j Browser solution as well as compared to the plain Java solution.
I am pretty sure that this cannot be true. So what am I doing wrong?

How do you measure it? if you measure the full runtime, then your time includes, jvm startup, database startup and class-loading and loading of store-files from disk.
Remember in the browser all of that is already running, and warmed up etc.
If you really want to measure your query, run it a number of times to warm up and then measure only the query execution and result loading.
Also consider using indexes or constraints as indicated and parameters, e.g. for your customer.name.

Related

jOOQ batch insertion inconsistency

While working with batch insertion in jOOQ (v3.14.4) I noticed some inconsistency when looking into PostgreSQL (v12.6) logs.
When doing context.batch(<query>).bind(<1st record>).bind(<2nd record>)...bind(<nth record>).execute() the logs show that the records are actually inserted one by one instead of all in one go.
While doing context.insert(<fields>).values(<1st record>).values(<2nd record>)...values(<nth record>) actually inserts everything in one go judging by the postgres logs.
Is it a bug in the jOOQ itself or was I using the batch(...) functionality incorrectly?
Here are 2 code snippets that are supposed to do the same but in reality, the first one inserts records one by one while the second one actually does the batch insertion.
public void batchInsertEdges(List<EdgesRecord> edges) {
Query batchQuery = context.insertInto(Edges.EDGES,
Edges.EDGES.SOURCE_ID, Edges.EDGES.TARGET_ID, Edges.EDGES.CALL_SITES,
Edges.EDGES.METADATA)
.values((Long) null, (Long) null, (CallSiteRecord[]) null, (JSONB) null)
.onConflictOnConstraint(Keys.UNIQUE_SOURCE_TARGET).doUpdate()
.set(Edges.EDGES.CALL_SITES, Edges.EDGES.as("excluded").CALL_SITES)
.set(Edges.EDGES.METADATA, field("coalesce(edges.metadata, '{}'::jsonb) || excluded.metadata", JSONB.class));
var batchBind = context.batch(batchQuery);
for (var edge : edges) {
batchBind = batchBind.bind(edge.getSourceId(), edge.getTargetId(),
edge.getCallSites(), edge.getMetadata());
}
batchBind.execute();
}
public void batchInsertEdges(List<EdgesRecord> edges) {
var insert = context.insertInto(Edges.EDGES,
Edges.EDGES.SOURCE_ID, Edges.EDGES.TARGET_ID, Edges.EDGES.CALL_SITES, Edges.EDGES.METADATA);
for (var edge : edges) {
insert = insert.values(edge.getSourceId(), edge.getTargetId(), edge.getCallSites(), edge.getMetadata());
}
insert.onConflictOnConstraint(Keys.UNIQUE_SOURCE_TARGET).doUpdate()
.set(Edges.EDGES.CALL_SITES, Edges.EDGES.as("excluded").CALL_SITES)
.set(Edges.EDGES.METADATA, field("coalesce(edges.metadata, '{}'::jsonb) || excluded.metadata", JSONB.class))
.execute();
}
I would appreciate some help to figure out why the first code snippet does not work as intended and second one does. Thank you!
There's a difference between "batch processing" (as in JDBC batch) and "bulk processing" (as in what many RDBMS call "bulk updates").
This page of the manual about data import explains the difference.
Bulk size: The number of rows that are sent to the server in one SQL statement.
Batch size: The number of statements that are sent to the server in one JDBC statement batch.
These are fundamentally different things. Both help improve performance. Bulk data processing does so by helping the RDBMS optimise resource allocation algorithms as it knows it is about to insert 10 records. Batch data processing does so by reducing the number of round trips between client and server. Whether either approach has a big impact on any given RDBMS is obviously vendor specific.
In other words, both of your approaches work as intended.

Neo4j super slow edge insert

Such a simple thing but... I am following https://neo4j.com/docs/developer-manual/current/cypher/clauses/create/#create-create-a-relationship-between-two-nodes but On Neo4j, the following Cypher query takes more than 120 seconds:
MATCH (from:PubmedDocumentNode), (to:PubmedAuthorNode)
WHERE from.PMID = 26408320
AND to.Author_Name = "Bando|Mika|M|"
CREATE (from)-[:AUTHOR {labels:[],label:[3.0],Id:0}]->(to)
There are also indexes:
Indexes
ON :PubmedAuthorNode(Author_Name) ONLINE
ON :PubmedDocumentNode(PMID) ONLINE
No constraints
so.. why??
EDIT: The same query without the create part runs in no time.

Aerospike Abnormal Increase In Memory Consumption when using UDF via Java Client (Query Aggregation)

So let me explain first. I have a sample snippet of code from Java that is written like this:
Statement statement = new Statement();
statement.setNamespace("foo");
statement.setSetName("bar");
statement.setAggregateFunction(Thread.currentThread().getContextClassLoader(),
"udf/resource/path","udfFilename","udfFunctionName",
"args1","args2","args3","args4");
ResultSet rs = aerospikeClient.getClient().queryAggregate(null,statement);
while(rs.next()){
//insert logic code here
}
With that snippet of sample code, I was able to use a UDF written in lua as sampled by the Aerospike Documentation. The UDF just search around the multiple bins and returns back its findings, it never persist nor transforms any data.
Now the thing is, when the function that uses this code that invokes UDF, in AMC (Aerospike Management Console) it spawns aggregation jobs that is mark as "done(ok)" but never mark as completed, and still are under the "Running Jobs" Table not the "Completed Jobs" Table. (see picture below)
Jobs Under Running Jobs Table
Jobs Under Completed Jos Table
and under Bash Terminal command "Top" I saw the Memory Percentage Usage of the Aerospike Server just keeps growing as the Jobs grew even more in number until such a time the Aerospike Server Fails since it maxed out its memory usage of the machine.
My questions are,
is it possible for the jobs to let go of this resources (if they are really what transpire on the abnormal memory increase)?
if it wasn't the jobs that is responsible, what is?
***EDITED:
Sample Lua Code:
local function map_request(record)
return map {response = record.response,
templateId = record.templateId,
id = record.id, requestSent = record.requestSent,
dateReplied = record.dateReplied}
end
function checkResponse(stream, responseFilter, templateId, validtyPeriod, currentDate)
local function filterResponse(record)
if responseFilter ~= "FOO" and validityPeriod > 0 then
return (record.response == responseFilter) and
(record.templateId == templateId) and
(record.dateReplied + validityPeriod) > currentDate
else
return (record.response == responseFilter) and
(record.templateId == templateId)
end
end
return stream:filter (filterResponse):map(map_request)
end

DB2 ERRORCODE 4499 SQLSTATE=58009

On our production application we recently become weird error from DB2:
Caused by: com.ibm.websphere.ce.cm.StaleConnectionException: [jcc][t4][2055][11259][4.13.80] The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated your particular request due to an error or a force interrupt. ERRORCODE=-4499, SQLSTATE=58009
This occurs when hibernate tries to select data from one big table(More than 6 milions records and 320 columns).
I observed that when ResultSet lower that 10 elements, hibernate selects successfully.
Our architecture:
Spring 4.0.3
Hibernate 4.3.5
DB2 v10 z/Os
Websphere 7.0.0.31(with JDBC V9.7FP5)
This select works when I tried to executed this in Data Studio or when app is started localy from Tomcat(connected to production Data Source). I suppose that Data Source on Websphere is not corectly configured, but I tried some modifications and without results. I also tried to update JDBC Driver but that not helped. Actually I become then ERRORCODE = -1244.
Ok, so now I'm looking for any help ;).
I can obviously provide additional information when needed.
Maybe someone fighted earlier with this problem?
Thanks in advance!
We have the same problem and finally solved by running REORG and RUNSTAT on the table(s). In our case, databse and tables were damaged and after running both mentioned operations, it resolved.
This occurs when hibernate tries to select data from one big table(More than 6 milions records and 320 columns)
6 million records with 320 columns seems huge to be read at once through hibernate. How you tried creating a database cursor and streaming few records at a time? In plain JDBC it is done as follows
Statement stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(50); //fetch only 50 records at a time
while with hibernate you would need the below code
Query query = session.createQuery(query);
query.setReadOnly(true);
query.setFetchSize(50);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
// iterate over results
while (results.next()) {
Object row = results.get();
// process row then release reference
// you may need to flush() as well
}
results.close();
This allows you to stream over the result set, however Hibernate will still cache results in the Session, so you’ll need to call session.flush() every so often. If you are only reading data, you might consider using a StatelessSession, though you should read its documentation beforehand.
Analyze the database table locking impact when using this approach.

Assign WriteConcern to mongo file system

Colleagues,
I'm using Mongo of v2.2 and java Mongo driver 2.9.0,
Some business logic creates approximately 25 threads and each thread creates 150 files on GridFS. Approximately 20 files per 1000 are not return correct getId() so result is null. I think (correct me if I'm wrong) it is correct behavior in perspective of throughput. But I really need this id. For regular DBCollection I would set WriteConcern.FSYNC_SAFE, but I cannot see if exist method setWriteConcern for GridFS. Have you some ideas how to force files be flushed ?
Looking at driver code in GridFS.java:
_filesCollection = _db.getCollection( _bucketName + ".files" );
I can resolve collection with the same name after creation of GridFS, so my code with setting write concern looks like:
_fs = new GridFS(_db, "MyBucketName");
DBCollection col = _db.getCollection( "MyBucketName" + ".files" );
col.setWriteConcern(WriteConcern.SAFE);
After starting tests I can see that all files are successfully returns correct id.

Categories