Neo4j node indexing: how to change Lucene default similarity?

Neo4j node indexing: how to change Lucene default similarity? - java

I'm using Neo4j (1.9) and lucene core 3.5, through the support offered by the neo4j-lucene-index library.
In my code, I create a new node index like that:
HashMap<String, String> stringMap = new HashMap<String, String>();
stringMap.put("type", "fulltext");
stringMap.put(IndexManager.PROVIDER, "lucene");
stringMap.put("to_lower_case", "false");
stringMap.put("similarity", MySimilarity.class.getName());
stringMap.put("analyzer", MyAnalyzer.class.getName());
simpleIndex = (LuceneIndex<Node>) template.getGraphDatabaseService()
.index().forNodes("simple-index", stringMap);
and query the node index by means of a multi-field query:
Analyzer analyzer = new MyAnalyzer();
Query query = null;
try {
query = MultiFieldQueryParser.parse(Version.LUCENE_35,
queriesArray, fieldsArray, flagsArray, analyzer);
} catch (ParseException e) {
e.printStackTrace();
return;
}
if (query == null)
return;
QueryContext qc = new QueryContext(query).sortByScore();
IndexHits<Node> hits = simpleIndex.query(qc);
How can I change the Similarity implementation, used for sorting by score, to MySimilarity class (which extends the DefaultSimilarity and overrides some of its methods)? The reported query always uses the DefaultSimilarity and I was not able to find any hook to set the similarity used by the searcher activated during my query.

The fact that the supplied similarity isn't used during queries is a bug and should be fixed. Created https://github.com/neo4j/neo4j/issues/375 to track it.

Related

Lucene index: getting empty result while query

I am trying to query with Lucene index but getting the empty result and below errors in the log,
Traversal query (query without index): select [jcr:path] from [nt:base] where isdescendantnode('/test') and name='World'; consider creating an index
[async] The index update failed
org.apache.jackrabbit.oak.api.CommitFailedException: OakAsync0002: Missing index provider detected for type [counter] on index [/oak:index/counter]
I am using RDB DocumentStore and I have checked index and node are created in nodes table.i tried below code,
#Autowired
NodeStore rdbNodeStore;
//create reposiotory
LuceneIndexProvider provider = new LuceneIndexProvider();
ContentRepository repository = new Oak(rdbNodeStore)
.with(new OpenSecurityProvider())
.with(new InitialContent())
.with((QueryIndexProvider) provider)
.with((Observer) provider)
.with(new LuceneIndexEditorProvider())
.withAsyncIndexing("async",
5).createContentRepository();
//login reposiotory and retrive session
ContentSession contentSession = repository.login(null, null);
Root root = contentSession.getLatestRoot();
//create lucene index
Tree index = root.getTree("/");
Tree t = index.addChild("oak:index");
t = t.addChild("lucene");
t.setProperty("jcr:primaryType", "oak:QueryIndexDefinition", Type.NAME);
t.setProperty("compatVersion", Long.valueOf(2L), Type.LONG);
t.setProperty("type", "lucene", Type.STRING);
t.setProperty("async", "async", Type.STRING);
t = t.addChild("indexRules");
t = t.addChild("nt:base");
Tree propnode = t.addChild("properties");
Tree t1 = propnode.addChild("name");
t1.setProperty("name", "name");
t1.setProperty("propertyIndex", Boolean.valueOf(true), Type.BOOLEAN);
root.commit();
//Create TestNode
String h = "Hello" + System.currentTimeMillis();
String w = "World" + System.currentTimeMillis();
Tree test = root.getTree("/").addChild("test");
test.addChild("a").setProperty("name", Arrays.asList(new String[] { h, w }), Type.STRINGS);
test.addChild("b").setProperty("name", h);
root.commit();
//Search
String query = "select [jcr:path] from [nt:base] where isdescendantnode('/test') and name='World' option(traversal ok)";
List<String> paths = executeQuery(root, query, "JCR-SQL2", true, false);
for (String path : paths) {
System.out.println("Path=" + path);
}
can anyone share some sample code on how to create Lucene index?

There are a couple of issues with what you're likely doing. First thing would the error that you're observing. Since you're using InitialContent which provisions an index with type="counter". For that you'd need to have .with(new NodeCounterEditorProvider()) while building the repository. That should avoid the error you are seeing.
But, your code would likely still not work because lucene indexes are async (which you've correctly configured). Due to that asynchronous behavior, you can't query immediately after adding the node.
I tried your code but had to add something like Thread.sleep(10*1000) before going for querying.
As another side-note, I'd recommend that you try out IndexDefinitionBuilder to make lucene index structure. So, you could replace
Tree index = root.getTree("/");
Tree t = index.addChild("oak:index");
t = t.addChild("lucene");
t.setProperty("jcr:primaryType", "oak:QueryIndexDefinition", Type.NAME);
t.setProperty("compatVersion", Long.valueOf(2L), Type.LONG);
t.setProperty("type", "lucene", Type.STRING);
t.setProperty("async", "async", Type.STRING);
t = t.addChild("indexRules");
t = t.addChild("nt:base");
Tree propnode = t.addChild("properties");
Tree t1 = propnode.addChild("name");
t1.setProperty("name", "name");
t1.setProperty("propertyIndex", Boolean.valueOf(true), Type.BOOLEAN);
root.commit();
with
IndexDefinitionBuilder idxBuilder = new IndexDefinitionBuilder();
idxBuilder.indexRule("nt:base").property("name").propertyIndex();
idxBuilder.build(root.getTree("/").addChild("oak:index").addChild("lucene"));
root.commit();
The latter approach, imo, is less error prone and more redabale.

alfresco buildonly indexer for searching the properties created on the fly

I am using the latest version of alfresco 5.1 version.
one of my requirement is to create properties (key / value) where user enter the key as well as the value.
so I have done that like this
Map<QName, Serializable> props = new HashMap<QName, Serializable>();
props.put(QName.createQName("customProp1"), "prop1");
props.put(QName.createQName("customProp2"), "prop2");
ChildAssociationRef associationRef = nodeService.createNode(nodeService.getRootNode(storeRef), ContentModel.ASSOC_CHILDREN, QName.createQName(GUID.generate()), ContentModel.TYPE_CMOBJECT, props);
Now what I want to do is search the nodes with these newly created properties. I was able to search the newly created property like this.
public List<NodeRef> findNodes() throws Exception {
authenticate("admin", "admin");
StoreRef storeRef = new StoreRef(StoreRef.PROTOCOL_WORKSPACE, "SpacesStore");
List<NodeRef> nodeList = null;
Map<QName, Serializable> props = new HashMap<QName, Serializable>();
props.put(QName.createQName("customProp1"), "prop1");
props.put(QName.createQName("customProp2"), "prop2");
ChildAssociationRef associationRef = nodeService.createNode(nodeService.getRootNode(storeRef), ContentModel.ASSOC_CHILDREN, QName.createQName(GUID.generate()), ContentModel.TYPE_CMOBJECT, props);
NodeRef nodeRef = associationRef.getChildRef();
String query = "#cm\\:customProp1:\"prop1\"";
SearchParameters sp = new SearchParameters();
sp.addStore(storeRef);
sp.setLanguage(SearchService.LANGUAGE_LUCENE);
sp.setQuery(query);
try {
ResultSet results = serviceRegistry.getSearchService().query(sp);
nodeList = new ArrayList<NodeRef>();
for (ResultSetRow row : results) {
nodeList.add(row.getNodeRef());
System.out.println(row.getNodeRef());
}
System.out.println(nodeList.size());
} catch (Exception e) {
e.printStackTrace();
}
return nodeList;
}
The alfresco-global.properties indexer configuration is
index.subsystem.name=buildonly
index.recovery.mode=AUTO
dir.keystore=${dir.root}/keystore
Now my question is
Is it possible to achieve the same using the solr4 indexer ?
Or Is there any way to use buildonly indexer for a particular query ?

In your query
String query = "#cm\\:customProp1:\"prop1\"";
remove cm as you are building the QName on the fly so it does not come under cm i.e. (ContentModel) properties. So your query will be
String query = "#\\:customProp1:\"prop1\"";
Hope this will work for you

First, double check if you're simply experiencing eventual consistency, as described below. If you are, and if this presents a problem for you, consider switching to CMIS queries while staying on SOLR.
http://docs.alfresco.com/5.1/concepts/solr-event-consistency.html
Other than this, check if the node has been indexed at all. If it has, take a closer look at how you build your query.
How to find List of unindexed file in alfresco

Fetch all facets records against specfic field Lucene

I'm beginner and started learning Lucene. Currently I 've implemented a Count Facets program in lucene 6.0.2 which display or output all facets field count. But now, want to search for a city "California" and in the result it will show all facets count w.r.t this query, How to do this...
Here is the code:
public List<FacetResult> runSearch() throws IOException {
DirectoryReader indexReader = DirectoryReader.open(indexDir);
IndexSearcher searcher = new IndexSearcher(indexReader);
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);
FacetsConfig config = new FacetsConfig();
FacetsCollector fc = new FacetsCollector();
FacetsCollector.search(searcher, new MatchAllDocsQuery(), 10, fc);
List<FacetResult> results = new ArrayList<>();
Facets facets = new FastTaxonomyFacetCounts(taxoReader, config, fc);
results.add(facets.getTopChildren(10, "city"));
results.add(facets.getTopChildren(10, "make"));
results.add(facets.getTopChildren(10, "year"));
results.add(facets.getTopChildren(10, "model"));
indexReader.close();
taxoReader.close();
return results;
}

From your example I get that you've searched * in your first search. To search for "* AND city:california" (Pretty useless in this case, but works as an example) you can build a DrillDownQuery with your original query and the filter, e.g.
Query baseQuery = new MatchAllDocsQuery();
DrillDownQuery ddQuery = new DrillDownQuery(config, baseQuery);
ddQuery.add("city", "california");
FacetsCollector fc = new FacetsCollector();
FacetsCollector.search(searcher, ddQuery, 10, fc);
Add can be called multiple times. If you call it for the same field the value is appended with OR, if you call it for a different field those limitation is appended with AND, e.g. (* AND (city:california OR city:york) AND year:1980)

How to restrict the field in Mongodb

I have to restrict some field which are there in processtemplate class. Below is the method which i have developed. When i pass some id it gives me runtime exception. Please help me out
public ProcessTemplate get(String id) throws GRIDRecordsDataManagerException {
ProcessTemplate entity = null;
try {
BasicDBObject qry = new BasicDBObject();
Map<String, Object> whereMap = new HashMap<String, Object>();
whereMap.put("id", id);
qry.putAll(whereMap);
BasicDBObject field = new BasicDBObject();
field.put("name", 1);
field.put("status", 1);
field.put("description", 1);
DBCursor results = dbCollection.find(qry, field);
if (results != null && results.hasNext()) {
DBObject dbObj = results.next();
entity = new ProcessTemplate();
entity.setId((String) dbObj.get("id"));
entity.setProcessName((String) dbObj.get("name"));
entity.setStatus((String) dbObj.get("status"));
entity.setDescription((String) dbObj.get("description"));
System.out.println(entity);
}
} catch (Exception e) {
}
return entity;
}

Answer copied from jira.mongodb.org
When querying a GridFS collection on specific fields, it works when no
GridFS file was ever stored on that collection. Once a file was saved
to the collection, the query on specific fields fails with "can't load
partial GridFSFile file".
...
For now I'll reset the object associated with the collection, quite a
hack though: if (Objects.equal(GridFSDBFile.class,
coll.getObjectClass())) { coll.setObjectClass(null); }
...
Hi, A collection can have an associated ObjectClass and this
information is cached, allowing it to be set once and then reused
elsewhere in your code. Once it is set you have to explicitly unset
it. GridFS is a specification for storing and retrieving files that is
built upon the driver. GridFS is opinionated about how it is to be
used, as such it sets the ObjectClass for the files collection when
you create a GridFS instance. The reason it throws an error is the
GridFSFile is not expected to be used in the way you've show as it
could represent a partial part of a file and which is why it throws
the "can't load partial GridFSFile file" runtime error. As you've
found out the associated ObjectClass can only unset by resetting the
ObjectClass back to null.
In your case it is translated to:
BasicDBObject qry = new BasicDBObject("id",id); //You can save 3 lines of code here, btw
BasicDBObject field = new BasicDBObject();
...
if (Objects.equal(GridFSDBFile.class, dbCollection.getObjectClass()))
dbCollection.setObjectClass(null);
DBCursor results = dbCollection.find(qry, field);
...

Nodes returned from a RestCypherEngine execution produces just urls

When executing queries on a standalone Neo4J server using the RestCypherEngine, what is the best practice to retrieve a collection of nodes?
I have this code snippet running....
public DbService() {
gd = new RestGraphDatabase("http://neo4jbox:7474/db/data/");
engine = new RestCypherQueryEngine(gd.getRestAPI());
}
public String testData() {
try (Transaction tx = gd.beginTx()) {
QueryResult<Map<String, Object>> result;
result = engine.query(
"match (n:Person{username:'jomski2009'}) return n ",
null);
Iterator<Map<String, Object>> itr = result.iterator();
while (itr.hasNext()) {
Map<String, Object> item = itr.next();
log.info(item.get("n"));
}
tx.success();
return result.toString();
}
}
When I run the code, I get the following result...
services.DbService : http://neo4jbox:7474/db/data/node/177
which is a link to the node rather than the node itself. Now I know that if I return just a subset of the properties of the node in the same query that works well. What I'd like to know is how do I retrieve complete node object without necessarily specifying the properties in the query?
Thanks for your help guys.

It is just the to-string representation of a RestNode, it still has the properties. But not the relationships fetched those will be fetched on demand.
I would recommend to try to fetch primitive values over the wire with Cypher, works best as it minimizes the transferred data and you only get what you need.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Neo4j node indexing: how to change Lucene default similarity? - java

The fact that the supplied similarity isn't used during queries is a bug and should be fixed. Created https://github.com/neo4j/neo4j/issues/375 to track it.

Related

Lucene index: getting empty result while query

alfresco buildonly indexer for searching the properties created on the fly

Fetch all facets records against specfic field Lucene

How to restrict the field in Mongodb

Nodes returned from a RestCypherEngine execution produces just urls

Categories

Resources