OutOfMemoryError caused when db4o database has 15000+ objects - java

I am using db4o 8.0.
I have a class
PostedMessage{
#Indexed
long receivedTime;
#Indexed
long sentTime;
...
//getter methods and setter methods for all the fields.
}
I persist the PostedMessage objects to db4o database. I have already saved 15000+ objects to db4o database. And now when I run following query, it results in OutOfMemoryError.
//Query to get PostedMessages between "start" and "end" dates.
Query q = db.query();
q.constrain(PostedMessage.class);
Constraint from = q.descend("receivedTime").constrain(new Long(start.getTimeInMillis())).greater().equal();
q.descend("receivedTime").constrain(new Long(end.getTimeInMillis())).smaller().equal().and(from);
q.execute();//results in OutOfMemoryError
To avoid the OutOfMemoryError, I need to add indexes to the fields of PostedMessage class. Read This.
I have a server/client configuration. I don't have control over pre-configuring the ObjectContainer before opening it.
I will have to apply/append the indexing CommonConfiguration after the ObjectContainer is just opened and provided to me.
I know how to create the config.
EmbeddedConfiguration appendConfig = Db4oEmbedded.newConfiguration();
appendConfig.common().objectClass(EmailMessage.class).objectField("receivedTime").indexed(true);
appendConfig.common().objectClass(EmailMessage.class).objectField("sentTime").indexed(true);
I am not able to figure out how to apply this config to already opened ObjectContainer.
How can I add indexes to the just opened ObjectContainer?
Is EmbeddedConfigurationItem's apply() method the answer? If it is, can I get a sample code that shows how to use it?
Edited : Added #Indexed annotation later to the question.

Look in Reference doc
at #Indexed

cl-r's suggestion of using TA/TP worked like a charm in my case. See his comment above.
You have also to install Transparent Activation/Transparent
Persistence to avoid to load unnecessary object in memmory. Look at
chapter 10&11 in tutorial (in the doc/tutorial directory of the
downloaded db4o[Version].zip] - cl-r

In my particular case, I need to iterate over the ObjectSet returned by the query.
It was found that using IMMEDIATE and SNAPSHOT query modes also solved the OutOfMemoryError problem. Also the timings were equally well. LAZY mode is not the solution for me.
It took about 8000 to 9000 ms to retrieve any 100 PostedMessages out of 100000 saved PostedMessages. e.g. 1 to 100, 1001 to 1100, 99899 to 99999.

You should add indexes for your queries. Otherwise db4o has to scan over all objects.
You can do it with an annotation, like this:
import com.db4o.config.annotations.Indexed;
PostedMessage{
#Indexed
long receivedTime;
long sentTime;
Or as you do, with the configuration:
EmbeddedConfiguration config = Db4oEmbedded.newConfiguration();
config.common().objectClass(EmailMessage.class).objectField("receivedTime").indexed(true);
config.common().objectClass(EmailMessage.class).objectField("sentTime").indexed(true);
ObjectContainer container = Db4oEmbedded.openFile(config,"your-data.db4o");
You cannot add this configuration when the container is already running. Only when opening it. When the indexes are not there yet, they will be added while opening the database. You need to get control over it, when opening. Or use the annotation above.

Related

Why does the Google Datastore console behave differently to the GAE Java library for Datastore?

I have a Google App Engine + Java app which has been happily running for many years (using JDO + datastore for persistence) and I have had no problem (occasionally, and reluctantly) updating a property of an entity in the Google Datastore console manually.
Recently (maybe the last 2-3 months) I have noticed a change in behaviour which breaks our app. I do not understand exactly what's going wrong or how we could handle it.
So my question is:
Why is it behaving differently and what can I do about it?
Let me first try to explain the behaviour I am seeing and then show my smallest possible replicating test case.
Suppose you had a simple persistence class:
#PersistenceCapable
public class Account implements Serializable {
#Persistent private ShortBlob testShortBlob;
#Persistent private String name;
// ...etc...
}
If I edited the name via the Datastore web console in the past, it would work as expected, the name field would change and everything else would work fine.
The behaviour I am seeing now is that after saving the entity via the console, I can no longer query and load the entity in JDO, I get:
java.lang.ClassCastException: com.google.appengine.api.datastore.Blob cannot be cast to com.google.appengine.api.datastore.ShortBlob
Which points to some underlying datastore change that means that ShortBlob field is having it's type change from ShortBlob to Blob (even though I make no edits to that field via the console).
This test case will replicate the issue:
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
// this one really is a ShortBlob - will load fine in JDO
Entity account = new Entity("Account", "123");
account.setProperty("name", "Test Name");
account.setUnindexedProperty("testShortBlob", new ShortBlob("blah".getBytes()));
datastore.put(account);
// this one really is not a ShortBlob, its a blob - it will fail for the same reason I am seeing in production.
account = new Entity("Account", "124");
account.setProperty("name", "Test Name 2");
account.setUnindexedProperty("testShortBlob", new Blob("blah".getBytes()));
datastore.put(account);
// then load the entity via JDO
try {
accountFromJdo = pm.getObjectById(Account.class, key);
} catch (Exception ex) {
System.out.println("We get here, the object won't load with the ClassCast Exception");
}
So that's the issue, but why would saving via the cloud datastore console be changing the ShortBlob's to Blob?
My workaround currently is to set the ShortBlob fields to null in the Datastore console - that will then allow the entity to load. But that sucks if the data in the blob is important!
Update:
I have been doing more testing on this, using the low-level JSON API to see if I could se a difference in the raw JSON responses before and after saving the entity via console. The good news is, I can!
Before editing the entity via the console, a shortBlob field saved to the Datastore via the JDO App Engine interface will look like this:
},
"testShortBlob": {
"blobValue": "tNp7MfsjhdfjkahsdvfkjhsdvfIItWyzy6glmIrow4WWhRPbhQ/U+MGX3opVvpxu"
},
But if I go in to the Datastore console, and edit the entity (leave the blob field unchanged, edit an unrelated field, such as name. Now when I run the same query I get:
},
"testShortBlob": {
"blobValue": "tNp7MfsjhdfjkahsdvfkjhsdvfIItWyzy6glmIrow4WWhRPbhQ/U+MGX3opVvpxu",
"excludeFromIndexes": true
},
Subtle difference, but I think it's important, according to the Java docs ShortBlob are indexed, and Blob are not.
So I think my question now is: why does editing an entity via the Cloud Datastore console change the indexed status of blob fields?
Thanks for the detailed question and debugging. This seems fishy. I will make sure https://issuetracker.google.com/issues/79547492 gets assigned to the correct team.
As far as workarounds go:
The JSON API you noticed is Cloud Datastore API v1 there are a variety of client libraries to help make it easy to access.
It is possible to use that API to transactionally read/modify/write entities. In your case it would allow you to perform the desired transforms. Alternatively, making modifications through JDO would also work.

WebSocketGremlinRequestEncoder must produce at least one message - janusgraph-dynamodb using withRemote "sideEffect" doesn't work

When I use gremlin-server connection using gremlin-driver in Java, I am not able to use "sideEffect" of GraphTraversal.
graph = EmptyGraph.instance()
cluster = Cluster.open("conf/remote-objects.yaml");
graphTraversalSource = graph.traversal().withRemote(DriverRemoteConnection.using(cluster));
My query that uses sideEffect looks like:
AtomicLong level1 = new AtomicLong(0);
graphTraversalSource.V().hasLabel("user")
.has("uuid", "1234")
.sideEffect(it -> it.get().property("level", level1.getAndIncrement())).emit().repeat(in())
.until(loops().is(5)).valueMap("uuid", "name", "level");
This query used to work when I was using janusgraph-dynamodb-storage-backend as dependency and running gremlin server within Java application and connecting to dyamodb. When i switched to using remote connection to gremlin server running in EC2, i started getting below error message:
java.util.concurrent.CompletionException: io.netty.handler.codec.EncoderException: WebSocketGremlinRequestEncoder must produce at least one message., took 3.895 sec
If I remove the sideEffect part from the above query, it works fine. I really need to add a custom property during traversal and include that in results without saving it in the database.
You have a few problems. The first problem is that you are trying to remote a lambda in the sideEffect() Lambdas can't be serialized to Gremlin bytecode - at least not in the form you've provided. However, you can do this:
gremlin> cluster = Cluster.open("conf/remote-objects.yaml")
==>localhost/127.0.0.1:8182
gremlin> g = graph.traversal().withRemote(DriverRemoteConnection.using(cluster))
==>graphtraversalsource[emptygraph[empty], standard]
gremlin> g.addV('person').as('p').addE('link').to('p')
==>e[1][0-link->0]
gremlin> g.V().sideEffect(Lambda.function("it.get().property('level',1)")).valueMap()
==>[level:[1]]
Note that I had to import import org.apache.tinkerpop.gremlin.util.function.* to the console to make that last line work there - That will be fixed for 3.2.7/3.3.0.
So, you could pass your lambda that way, but:
I don't think your traversal will work as before because you are referencing a variable local to the client with level1 - the server is not going to know anything about that.
TinkerPop generally recommends that you avoid lambdas.
I don't quite follow what your Gremlin is doing to provide a suggestion on how to resolve this. You do give this hint:
I really need to add a custom property during traversal and include that in results without saving it in the database.
...but the Gremlin above does write the value of level1 to the database so I'm not sure of what you are after.

neo4j java rest binding api not returning from getNodeAutoIndexer

I have an application which uses neo4j embedded database. Now, I want to migrate to neo4j server as I need to integrate this application with a web app (using servlets, tomcat).
I want to change the code minimally, So I thought of using java-rest-binding api of neo4j. But I am stuck at getting the auto node index. The method getAutoNodeIndexer doesn't return. In messages.log of the database, It shows
[o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for an additional 254ms [total block time: 2.678s]
I have no idea how to solve this.
I have set the appropriate properties in the neo4j.properties, which are
node_auto_indexing=true
node_keys_indexable=primaryKey
relationship_auto_indexing=true
relationship_keys_indexable=X-->Y
And this is what my code looks like:
graphDb = new RestGraphDatabase("http://localhost:7474/db/data/");
ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
ReadableRelationshipIndex autoRelIndex = graphDb.index().getRelationshipAutoIndexer().getAutoIndex();
It seems that there's a lot of garbage collection going on. Run your app with a bigger heap (e.g. -Xmx1g) and see what happens.
EDIT:
Also, relationship_keys_indexable=X-->Y seems strange. I'd expect a property name there. What happens if you remove this property or enter a valid value?
To state the official documentation:
The node_keys_indexable key allows you to specify a comma-separated
list of node property keys to be indexed. The
relationship_keys_indexable does the same for relationship property
keys.

batch update in play-morphia

I'm using Play framework 1.2.5 and Play-Morphia module.
I want to know if there's a way to update many objects at one Morphia query. I've found this example at https://github.com/greenlaw110/play-morphia/blob/master/documentation/manual/crud.textile, but it seems that I can't use "in" operation in norder to find all the objects which I hold in a list of their IDs.
I'm trying to update the paidInvoiceDocNum filed in each of the objects which their IDs are in the list "itemsIds". This is what I've tried so far:
String q = TransactionItem.find().field("id").in(itemsIds).toString();
TransactionItem.o().set("paidInvoiceDocNum", String.valueOf(docNumber)).update(q);
Without the .toString() it doesn't work either.
Any suggestions?
After long time of experimenting with Play-Morphia, I've found the way to do this update and here it is:
Datastore ds = TransactionItem.ds();
UpdateOperations<TransactionItem> op = ds.createUpdateOperations(TransactionItem.class).set("paidInvoiceDocNum", String.valueOf(docNumber));
Query<TransactionItem> q = (Query<TransactionItem>)TransactionItem.q().filter("id in", itemsIds).getMorphiaQuery();
ds.update(q, op);
Hope It will help...
Can you try this?
TransactionItem.o().set("paidInvoiceDocNum", docNumber).update("id in", itemsIds);
BTW, what's your morphia version. Keep in mind Play has close the updates to modules. Use this to get the latest morphia plugin version: https://gist.github.com/greenlaw110/2868365

How can I get the number of nodes of a Neo4j graph database from java and can we store and reuse graphdb from disk?

I just started looking at neo4j to use it for my social-network related project. During this I came across the following code:
https://github.com/neo4j/neo4j/blob/1.9.M04/community/embedded-examples/src/main/java/org/neo4j/examples/EmbeddedNeo4jWithIndexing.java
While going through it (please refer to above link for code), I was struggling to know, how to get the total number of nodes added to a given graphDb. Is there any way to find it (total number of nodes) using graphDb or nodeIndex or referenceIndex or anything else? If yes, How?
I also need help to know, how to store the graphdb to any given path on disk? How to load this stored graphdb and perform operations on it like searching for a node/relationship etc?
(There are several files like *.db, *.id, *.keys etc.. created at given DB_PATH when above code is executed. What are all those files useful for? Does any of those files contain nodes created? if yes, how can we use them?)
How can we access this graphDb from web-interfaces like, Dashboard at http://localhost:7474/webadmin/ or data at http://localhost:7474/db/data/ ?
Please let me know in case you need any specific information to help me..
Thanks, Nitin.
For getting started with Neo4j Embedded and the Java API see:
http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded.html
Getting correct counts of nodes and relationships:
IteratorUtil.count(GlobalGraphOperations.at(gdb).getAllNodes())
IteratorUtil.count(GlobalGraphOperations.at(gdb).getAllRelationships())
For accessing an embedded graph database with an integrated neo4j server, see
http://docs.neo4j.org/chunked/milestone/server-embedded.html
Phewww! Those are a lot of questions for one entry...
To get the total number of nodes and relationships in your DB, use:
NodeManager nodeManager = ((GraphDatabaseAPI) graphDb).getDependencyResolver().resolveDependency(
NodeManager.class);
long currentRelationships = nodeManager.getNumberOfIdsInUse(Relationship.class);
long currentNodes = nodeManager.getNumberOfIdsInUse(Node.class);
To change the path of the graph DB, simply pass the path to the GraphDatabaseFactory().newEmbeddedDatabase method. In the example you mentioned, you could simply set DB_PATH e.g. to /home/youruser/neo4j.
To access your DB with the webadmin, download neo4j, change the org.neo4j.server.database.location property in the file conf/neo4j-server.properties and point it to the path of your DB and launch the server.

Categories