Migration of blobs from database to the file system in jackrabbit

Migration of blobs from database to the file system in jackrabbit - java

As being proposed in the previous discussion Using file system instead of database to store pdf files in jackrabbit
we can use FileDataStore to store blob files in the file system instead of database (i my case have stored ~ 100 kb size pdfs).
The following problem I have faced is dealing with files that have been previously stored in blobstore and I want them to be accessible after switching to FileDataStore.
After adding FileDataStore support to the repository.xml
when using JcrUtils method getOrAddNode i get ItemExistsException:
public static Node getOrAddNode(Node parent, String name)
throws RepositoryException {
if (parent.hasNode(name)) {
return parent.getNode(name);
} else {
return parent.addNode(name);
}
}
e.g. parent.hasNode(name) returns false (it seems the item doesn't exist)
but then we fall in to the code parent.addNode(name) which consequently throws ItemExistsException.
Any help?
Is it necessary to proceed the migration of blobs to the FileDataStore or there is kind of configuration that jackrabbit could search for blobs in different locations at the same time: in my case mysql database and filesystem.
Some comments:
I have found at least several ways that could help do the migration job:
spec http://wiki.apache.org/jackrabbit/BackupAndMigration
tells about using JCR API (Session.exportSystemView(..) and then Session.importXML(..) ), using RepositoryCopier API etc.
jackrabbit-jcr-import-export-tool (see http://svn.apache.org/repos/asf/jackrabbit/sandbox/jackrabbit-jcr-import-export-tool/README.txt)
using jackrabbit standalone server (http://jackrabbit.apache.org/standalone-server.html)

It might be possible that there is a repository corruption. That is, the node contains a child node entry for the given name (the node you want to add), but the child node itself doesn't exist. Specially in older version of Jackrabbit you could get into this situation if multiple sessions concurrently tried to change the same nodes.
To fix such corruption problems, the bundle db persistence managers support a consistency check & fix feature. You would need to set those options in the repository.xml and workspace.xml files, and restart Jackrabbit. Once fixed, you can disable those options again.
There is also a way to fix such problems at runtime, by setting the system property org.apache.jackrabbit.autoFixCorruptions to true, and then traverse over all nodes in the repository.

Related

Internal Server Error due to long XQuery duration (MarkLogic)

I have am currently running through some queries using the Java API provided by MarkLogic. I have installed it through adding the required dependencies to my library. The connection is set up using
DatabaseClient client = DatabaseClientFactory.newClient("localhost", 8000, secContext, ConnectionType.DIRECT);
From here some XQueries are ran using the code shown below
ServerEvaluationCall evl = client.newServerEval().xquery(query);
EvalResultIterator evr = evl.eval();
while(evr.hasNext()){
//Do something with the results
}
However, certain queries takes a long time to process causing an internal error.So Other then reducing the query time required, I am wondering if there is there a way to overcome this? Such as increasing of connection time limit for instance.
====Update===
Query used
xquery version "1.0-ml";
let $query-opts := /comments[fn:matches(text,".*generation.*")]
return(
$query-opts, fn:count($query-opts), xdmp:elapsed-time()
)
I know the regular expression used can be easily replaced by word-query. But for this instance I would like to just used regular expression for searching.
Example Data
<comments>
<date_commented>1998-01-14T04:32:30</date_commented>
<text>iCloud sync settings are not supposed to change after an iOS update. In the case of iOS 10.3 this was due to a bug.</text>
<uri>/comment/000000001415898</uri>
</comments>

On the basis of your provided data I'd use xdmp:estimate and a cts query.
xdmp:estimate(cts:search(doc(), cts:and-query((
cts:directory-query('/comment/'),
cts:element-word-query(xs:QName("text"), "generation")
))))
This will search all documents in your /comments/ directory for an element text containing the word generation. As you already know, this will only use indexes and does not require loading/parsing documents.
This also will not find any false-positives because there is only one text element per document/fragment (if your shown data is correct).

Table doesn't exists in Liferay 6

I have defined a data table and associated objects in Liferay 6, but when I run the code it says the table doesn't exists, and it's true. The code runs fine when I create the table by hand just copy-pasting the create table from the model implementation, but I expected the table to be created when deploying.
The user has all the privileges needed to create it.
What I'm missing?

I face the same problem. and #urvish is correct you have to change build number in
service properties file.
problem
When multiple developers working on portlet that uses servicebuilder
will give below exception “Build namespace has build number which is
newer than “. When developer commits service.properties file and that
deployed on other developer machine then it will throw below
exception
Best Practice: To avoid these kind of errors, follow these:
create service-ext.properties file at the same locaiton of service.properties
add build.number={higher-value or same value in exception)
Deploy portlet again
.

Check values of build.namespace in service.properties file and value of
select buildNumber from servicecomponent where buildNamespace = <<build.namespace from service.properties>>
Now the buildNumber return from query must be lesser than value of build.number propert in service.properties. If it is not then just set the value of build.number to 9999.
Sometimes due to mismatch, changes are not applied to database.

How can I get the path of my Neo4j <config storeDirectory=""> in a Batch Inserter method?

I'm using Neo4j 2.2.8 and Spring Data in a web application. I'm using xml to configure my database, like:
<neo4j:config storeDirectory="S:\Neo4j\mybase" />
But I'm trying to use a Batch Inserter to add more than 1 million of nodes sourced from a .txt file. After reading the file and setting the List of objects, my code to batch is something like:
public void batchInserter(List<Objects> objects) {
BatchInserter inserter = null;
try {
inserter = BatchInserters.inserter("S:\\Neo4j\\mybase");
Label movimentosLabel = DynamicLabel.label("Movimentos");
inserter.createDeferredSchemaIndex(movimentosLabel).on("documento").create();
for (Objects objs : objects{
Map<String, Object> properties = new HashMap<>();
properties.put("documento", objs.getDocumento());
long movimento = inserter.createNode(properties, movimentosLabel);
DynamicRelationshipType relacionamento = DynamicRelationshipType.withName("CONTA_MOVIMENTO");
inserter.createRelationship(movimento, objs.getConta().getId(), relacionamento, null);
}
} finally {
if (inserter != null) {
inserter.shutdown();
}
}
}
Is it possible to get the path of my database configured in my xml in the "inserter"? Because with the above configuration Neo4j gives me an error about multiple connections. Can I set a property to solve this error of multiple connections? Has anyone had this problem and have any idea how to solve it? Ideas are welcome.
Thanks to everyone!

Your question has several pieces to it:
Error About Multiple Connections
If you're using spring-data with a local database tied to a particular directory or file, be aware that you can't have two neo4j processes opening the same DB at the same time. This means that if you've decided to use BatchInserter against the same file/directory, this cannot happen at all while the JVM that's using the spring-data DB is running. There won't be a way I know of to get around that problem. One option would be to not use the batch inserter against the file, but to use the REST API to do inserting.
get the path of my database configured in my xml
Sure, there's a way to do that, you'd have to consult the relevant documentation. I can't give you the code for that because it depends on which config file your'e talking about and how it's structured, but in essence there should be a way to inject the right thing into your code here, and read the property from the XML file out of that injected object.
But that won't help you given your "Multiple connections" issue mentioned above.
Broadly, I think your solution is either:
Don't run your spring app and your batch inserter at the same time.
Run your spring app, but do insertion via the REST API or other method, so there isn't a multiple connection issue to begin with.

neo4j java rest binding api not returning from getNodeAutoIndexer

I have an application which uses neo4j embedded database. Now, I want to migrate to neo4j server as I need to integrate this application with a web app (using servlets, tomcat).
I want to change the code minimally, So I thought of using java-rest-binding api of neo4j. But I am stuck at getting the auto node index. The method getAutoNodeIndexer doesn't return. In messages.log of the database, It shows
[o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for an additional 254ms [total block time: 2.678s]
I have no idea how to solve this.
I have set the appropriate properties in the neo4j.properties, which are
node_auto_indexing=true
node_keys_indexable=primaryKey
relationship_auto_indexing=true
relationship_keys_indexable=X-->Y
And this is what my code looks like:
graphDb = new RestGraphDatabase("http://localhost:7474/db/data/");
ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
ReadableRelationshipIndex autoRelIndex = graphDb.index().getRelationshipAutoIndexer().getAutoIndex();

It seems that there's a lot of garbage collection going on. Run your app with a bigger heap (e.g. -Xmx1g) and see what happens.
EDIT:
Also, relationship_keys_indexable=X-->Y seems strange. I'd expect a property name there. What happens if you remove this property or enter a valid value?
To state the official documentation:
The node_keys_indexable key allows you to specify a comma-separated
list of node property keys to be indexed. The
relationship_keys_indexable does the same for relationship property
keys.

How can I get the number of nodes of a Neo4j graph database from java and can we store and reuse graphdb from disk?

I just started looking at neo4j to use it for my social-network related project. During this I came across the following code:
https://github.com/neo4j/neo4j/blob/1.9.M04/community/embedded-examples/src/main/java/org/neo4j/examples/EmbeddedNeo4jWithIndexing.java
While going through it (please refer to above link for code), I was struggling to know, how to get the total number of nodes added to a given graphDb. Is there any way to find it (total number of nodes) using graphDb or nodeIndex or referenceIndex or anything else? If yes, How?
I also need help to know, how to store the graphdb to any given path on disk? How to load this stored graphdb and perform operations on it like searching for a node/relationship etc?
(There are several files like *.db, *.id, *.keys etc.. created at given DB_PATH when above code is executed. What are all those files useful for? Does any of those files contain nodes created? if yes, how can we use them?)
How can we access this graphDb from web-interfaces like, Dashboard at http://localhost:7474/webadmin/ or data at http://localhost:7474/db/data/ ?
Please let me know in case you need any specific information to help me..
Thanks, Nitin.

For getting started with Neo4j Embedded and the Java API see:
http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded.html
Getting correct counts of nodes and relationships:
IteratorUtil.count(GlobalGraphOperations.at(gdb).getAllNodes())
IteratorUtil.count(GlobalGraphOperations.at(gdb).getAllRelationships())
For accessing an embedded graph database with an integrated neo4j server, see
http://docs.neo4j.org/chunked/milestone/server-embedded.html

Phewww! Those are a lot of questions for one entry...
To get the total number of nodes and relationships in your DB, use:
NodeManager nodeManager = ((GraphDatabaseAPI) graphDb).getDependencyResolver().resolveDependency(
NodeManager.class);
long currentRelationships = nodeManager.getNumberOfIdsInUse(Relationship.class);
long currentNodes = nodeManager.getNumberOfIdsInUse(Node.class);
To change the path of the graph DB, simply pass the path to the GraphDatabaseFactory().newEmbeddedDatabase method. In the example you mentioned, you could simply set DB_PATH e.g. to /home/youruser/neo4j.
To access your DB with the webadmin, download neo4j, change the org.neo4j.server.database.location property in the file conf/neo4j-server.properties and point it to the path of your DB and launch the server.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Migration of blobs from database to the file system in jackrabbit - java

Related

Internal Server Error due to long XQuery duration (MarkLogic)

Table doesn't exists in Liferay 6

How can I get the path of my Neo4j <config storeDirectory=""> in a Batch Inserter method?

neo4j java rest binding api not returning from getNodeAutoIndexer

How can I get the number of nodes of a Neo4j graph database from java and can we store and reuse graphdb from disk?

Categories

Resources