Fetch all facets records against specfic field Lucene - java

I'm beginner and started learning Lucene. Currently I 've implemented a Count Facets program in lucene 6.0.2 which display or output all facets field count. But now, want to search for a city "California" and in the result it will show all facets count w.r.t this query, How to do this...
Here is the code:
public List<FacetResult> runSearch() throws IOException {
DirectoryReader indexReader = DirectoryReader.open(indexDir);
IndexSearcher searcher = new IndexSearcher(indexReader);
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);
FacetsConfig config = new FacetsConfig();
FacetsCollector fc = new FacetsCollector();
FacetsCollector.search(searcher, new MatchAllDocsQuery(), 10, fc);
List<FacetResult> results = new ArrayList<>();
Facets facets = new FastTaxonomyFacetCounts(taxoReader, config, fc);
results.add(facets.getTopChildren(10, "city"));
results.add(facets.getTopChildren(10, "make"));
results.add(facets.getTopChildren(10, "year"));
results.add(facets.getTopChildren(10, "model"));
indexReader.close();
taxoReader.close();
return results;
}

From your example I get that you've searched * in your first search. To search for "* AND city:california" (Pretty useless in this case, but works as an example) you can build a DrillDownQuery with your original query and the filter, e.g.
Query baseQuery = new MatchAllDocsQuery();
DrillDownQuery ddQuery = new DrillDownQuery(config, baseQuery);
ddQuery.add("city", "california");
FacetsCollector fc = new FacetsCollector();
FacetsCollector.search(searcher, ddQuery, 10, fc);
Add can be called multiple times. If you call it for the same field the value is appended with OR, if you call it for a different field those limitation is appended with AND, e.g. (* AND (city:california OR city:york) AND year:1980)

Related

Lucene queryParser builds the query correctly but search doesn't wrok

I have indexed IntPointField using lucene which I am able to fetch using below query:
Query query = IntPoint.newRangeQuery("field1", 0, 40);
TopDocs topDocs = searcher.search(query);
System.out.println(topDocs.totalHits);
its fetching the relevant correctly.
If i build the query using parse it doesn't work
Query query = new QueryParser(Version.LUCENE_8_11_0.toString(), new StandardAnalyzer()).parse("field1:[0 TO 40]");
I checked the string representation of both the query they look identical as below
field1:[0 TO 40]
Does anyone know what am I doing wrong?
IntPoint field requires custom query paser.
The below solves the problem
StandardQueryParser parser = new StandardQueryParser();
parser.setAnalyzer(new StandardAnalyzer());
PointsConfig indexableFields = new PointsConfig(new DecimalFormat(), Integer.class);
Map<String, PointsConfig> indexableFieldMap = new HashMap<>();
pointsConfigMap.put("field1", indexableFields);
parser.setPointsConfigMap(indexableFieldMap);

Lucene index: getting empty result while query

I am trying to query with Lucene index but getting the empty result and below errors in the log,
Traversal query (query without index): select [jcr:path] from [nt:base] where isdescendantnode('/test') and name='World'; consider creating an index
[async] The index update failed
org.apache.jackrabbit.oak.api.CommitFailedException: OakAsync0002: Missing index provider detected for type [counter] on index [/oak:index/counter]
I am using RDB DocumentStore and I have checked index and node are created in nodes table.i tried below code,
#Autowired
NodeStore rdbNodeStore;
//create reposiotory
LuceneIndexProvider provider = new LuceneIndexProvider();
ContentRepository repository = new Oak(rdbNodeStore)
.with(new OpenSecurityProvider())
.with(new InitialContent())
.with((QueryIndexProvider) provider)
.with((Observer) provider)
.with(new LuceneIndexEditorProvider())
.withAsyncIndexing("async",
5).createContentRepository();
//login reposiotory and retrive session
ContentSession contentSession = repository.login(null, null);
Root root = contentSession.getLatestRoot();
//create lucene index
Tree index = root.getTree("/");
Tree t = index.addChild("oak:index");
t = t.addChild("lucene");
t.setProperty("jcr:primaryType", "oak:QueryIndexDefinition", Type.NAME);
t.setProperty("compatVersion", Long.valueOf(2L), Type.LONG);
t.setProperty("type", "lucene", Type.STRING);
t.setProperty("async", "async", Type.STRING);
t = t.addChild("indexRules");
t = t.addChild("nt:base");
Tree propnode = t.addChild("properties");
Tree t1 = propnode.addChild("name");
t1.setProperty("name", "name");
t1.setProperty("propertyIndex", Boolean.valueOf(true), Type.BOOLEAN);
root.commit();
//Create TestNode
String h = "Hello" + System.currentTimeMillis();
String w = "World" + System.currentTimeMillis();
Tree test = root.getTree("/").addChild("test");
test.addChild("a").setProperty("name", Arrays.asList(new String[] { h, w }), Type.STRINGS);
test.addChild("b").setProperty("name", h);
root.commit();
//Search
String query = "select [jcr:path] from [nt:base] where isdescendantnode('/test') and name='World' option(traversal ok)";
List<String> paths = executeQuery(root, query, "JCR-SQL2", true, false);
for (String path : paths) {
System.out.println("Path=" + path);
}
can anyone share some sample code on how to create Lucene index?
There are a couple of issues with what you're likely doing. First thing would the error that you're observing. Since you're using InitialContent which provisions an index with type="counter". For that you'd need to have .with(new NodeCounterEditorProvider()) while building the repository. That should avoid the error you are seeing.
But, your code would likely still not work because lucene indexes are async (which you've correctly configured). Due to that asynchronous behavior, you can't query immediately after adding the node.
I tried your code but had to add something like Thread.sleep(10*1000) before going for querying.
As another side-note, I'd recommend that you try out IndexDefinitionBuilder to make lucene index structure. So, you could replace
Tree index = root.getTree("/");
Tree t = index.addChild("oak:index");
t = t.addChild("lucene");
t.setProperty("jcr:primaryType", "oak:QueryIndexDefinition", Type.NAME);
t.setProperty("compatVersion", Long.valueOf(2L), Type.LONG);
t.setProperty("type", "lucene", Type.STRING);
t.setProperty("async", "async", Type.STRING);
t = t.addChild("indexRules");
t = t.addChild("nt:base");
Tree propnode = t.addChild("properties");
Tree t1 = propnode.addChild("name");
t1.setProperty("name", "name");
t1.setProperty("propertyIndex", Boolean.valueOf(true), Type.BOOLEAN);
root.commit();
with
IndexDefinitionBuilder idxBuilder = new IndexDefinitionBuilder();
idxBuilder.indexRule("nt:base").property("name").propertyIndex();
idxBuilder.build(root.getTree("/").addChild("oak:index").addChild("lucene"));
root.commit();
The latter approach, imo, is less error prone and more redabale.

alfresco buildonly indexer for searching the properties created on the fly

I am using the latest version of alfresco 5.1 version.
one of my requirement is to create properties (key / value) where user enter the key as well as the value.
so I have done that like this
Map<QName, Serializable> props = new HashMap<QName, Serializable>();
props.put(QName.createQName("customProp1"), "prop1");
props.put(QName.createQName("customProp2"), "prop2");
ChildAssociationRef associationRef = nodeService.createNode(nodeService.getRootNode(storeRef), ContentModel.ASSOC_CHILDREN, QName.createQName(GUID.generate()), ContentModel.TYPE_CMOBJECT, props);
Now what I want to do is search the nodes with these newly created properties. I was able to search the newly created property like this.
public List<NodeRef> findNodes() throws Exception {
authenticate("admin", "admin");
StoreRef storeRef = new StoreRef(StoreRef.PROTOCOL_WORKSPACE, "SpacesStore");
List<NodeRef> nodeList = null;
Map<QName, Serializable> props = new HashMap<QName, Serializable>();
props.put(QName.createQName("customProp1"), "prop1");
props.put(QName.createQName("customProp2"), "prop2");
ChildAssociationRef associationRef = nodeService.createNode(nodeService.getRootNode(storeRef), ContentModel.ASSOC_CHILDREN, QName.createQName(GUID.generate()), ContentModel.TYPE_CMOBJECT, props);
NodeRef nodeRef = associationRef.getChildRef();
String query = "#cm\\:customProp1:\"prop1\"";
SearchParameters sp = new SearchParameters();
sp.addStore(storeRef);
sp.setLanguage(SearchService.LANGUAGE_LUCENE);
sp.setQuery(query);
try {
ResultSet results = serviceRegistry.getSearchService().query(sp);
nodeList = new ArrayList<NodeRef>();
for (ResultSetRow row : results) {
nodeList.add(row.getNodeRef());
System.out.println(row.getNodeRef());
}
System.out.println(nodeList.size());
} catch (Exception e) {
e.printStackTrace();
}
return nodeList;
}
The alfresco-global.properties indexer configuration is
index.subsystem.name=buildonly
index.recovery.mode=AUTO
dir.keystore=${dir.root}/keystore
Now my question is
Is it possible to achieve the same using the solr4 indexer ?
Or Is there any way to use buildonly indexer for a particular query ?
In your query
String query = "#cm\\:customProp1:\"prop1\"";
remove cm as you are building the QName on the fly so it does not come under cm i.e. (ContentModel) properties. So your query will be
String query = "#\\:customProp1:\"prop1\"";
Hope this will work for you
First, double check if you're simply experiencing eventual consistency, as described below. If you are, and if this presents a problem for you, consider switching to CMIS queries while staying on SOLR.
http://docs.alfresco.com/5.1/concepts/solr-event-consistency.html
Other than this, check if the node has been indexed at all. If it has, take a closer look at how you build your query.
How to find List of unindexed file in alfresco

Grouping Solr results in Solr 3.6.1 API causes NullPointerException when parsing result

As long as I limit my query to:
SolrQuery solrQuery = new SolrQuery();
solrQuery.set("q", query); //where query is solr query string (e.g. *:*)
solrQuery.set("start", 0);
solrQuery.set("rows", 10);
everything works fine - results are returned and so on.
Things are getting worse when I try to group results by my field "Token_group" to avoid duplicates:
SolrQuery solrQuery = new SolrQuery();
solrQuery.set("q", query); //where query is solr query string (e.g. *:*)
solrQuery.set("start", 0);
solrQuery.set("rows", 10);
solrQuery.set("group", true);
solrQuery.set("group.field", "token_group");
solrQuery.set("group.ngroups", true);
solrQuery.set("group.limit", 20);
Using this results in HttpSolrServer no exceptions are being thrown, but trying to access results ends up in NPE.
My querying Solr method:
public SolrDocumentList query(SolrQuery query) throws SolrServerException {
QueryResponse response = this.solr.query(query); //(this.solr is handle to HttpSolrSelver)
SolrDocumentList list = response.getResults();
return list;
}
note that similar grouping (using the very same field) is made in our other apps (PHP) and works fine, so this is not a schema issue.
I solved my issue. In case someone needs this in future:
When you perform a group query, you should use different methods to get and parse results.
While in ungrouped queries
QueryResponse response = this.solr.query(query); //(this.solr is handle to HttpSolrSelver)
SolrDocumentList list = response.getResults();
will work, when you want to query for groups, it won't.
So, how do I make and parse query?
Below code for building query is perfectly fine:
SolrQuery solrQuery = new SolrQuery();
solrQuery.set("q", query); //where query is solr query string (e.g. *:*)
solrQuery.set("start", 0);
solrQuery.set("rows", 10);
solrQuery.set("group", true);
solrQuery.set("group.field", "token_group");
solrQuery.set("group.ngroups", true);
solrQuery.set("group.limit", 20);
where last four lines define that Solr should group results and parameters of grouping. In this case group.limit will define how many maximum results within a group you want, and rows will tell how many max results should be there.
Making grouped query looks like this:
List<GroupCommand> groupCommands = this.solr.query(query).getGroupResponse().getValues();
referring to documentation, GroupCommand contains info about grouping as well as list of results, divided by groups.
Okay, I want to get to the results. How to do it?
Well, in my example there's only one position in List<GroupCommand> groupCommands, so to get list of found groups within it:
GroupCommand groupCommand = groupCommands.get(0);
List<Group> groups = groupCommand.getValues();
This will result in list of groups. Each group contains its own SolrDocumentList. To get it:
for(Group g : groups){
SolrDocumentList groupList = g.getResult();
(...)
}
Having this, well just proceed with SolrDocumentList for each group.
I used grouping query to get list of distinct results. How to do it?
This was exacly my case. It seems easy but there's a tricky part that can catch you if you're refactoring already running code that uses getNumFound() from SolrDocumentList.
Just analyze my code:
/**
* Gets distinct resultlist from grouped query
*
* #param query
* #return results list
* #throws SolrServerException
*/
public SolrDocumentList queryGrouped(SolrQuery query) throws SolrServerException {
List<GroupCommand> groupCommands = this.solr.query(query).getGroupResponse().getValues();
GroupCommand groupCommand = groupCommands.get(0);
List<Group> groups = groupCommand.getValues();
SolrDocumentList list = new SolrDocumentList();
if(groups.size() > 0){
long totalNumFound = groupCommand.getNGroups();
int iteratorLimit = 1;
for(Group g : groups){
SolrDocumentList groupList = g.getResult();
list.add(groupList.get(0));
//I wanted to limit list to 10 records
if(iteratorLimit++ > 10){
break;
}
}
list.setNumFound(totalNumFound);
}
return list;
}

Lucene : Changing the default facet delimiter?

First post on this wonderful site!
My goal is to use hierarchical facets for searching an index using Lucene. However, my facets need to be delimited by a character other than '/', (in this case, '~'). Example:
Categories
Categories~Category1
Categories~Category2
I have created a class that implements FacetIndexingParams interface (a copy of DefaultFacetIndexingParams with the DEFAULT_FACET_DELIM_CHAR param set to '~').
Paraphrased indexing code : (using FSDirectory for both index and taxonomy)
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34)
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer)
IndexWriter writer = new IndexWriter(indexDir, config)
TaxonomyWriter taxo = new LuceneTaxonomyWriter(taxDir, OpenMode.CREATE)
Document doc = new Document()
// Add bunch of Fields... hidden for the sake of brevity
List<CategoryPath> categories = new ArrayList<CategoryPath>()
row.tags.split('\\|').each{ tag ->
def cp = new CategoryPath()
tag.split('~').each{
cp.add(it)
}
categories.add(cp)
}
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
DocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder(taxo, facetIndexingParams)
categoryDocBuilder.setCategoryPaths(categories).build(doc)
writer.addDocument(doc)
// Commit and close both writer and taxo.
Search code paraphrased:
// Create index and taxonomoy readers to get info from index and taxonomy
IndexReader indexReader = IndexReader.open(indexDir)
TaxonomyReader taxo = new LuceneTaxonomyReader(taxDir)
Searcher searcher = new IndexSearcher(indexReader)
QueryParser parser = new QueryParser(Version.LUCENE_34, "content", new StandardAnalyzer(Version.LUCENE_34))
parser.setAllowLeadingWildcard(true)
Query q = parser.parse(query)
TopScoreDocCollector tdc = TopScoreDocCollector.create(10, true)
List<FacetResult> res = null
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
FacetSearchParams facetSearchParams = new FacetSearchParams(facetIndexingParams)
CountFacetRequest cfr = new CountFacetRequest(new CategoryPath(""), 99)
cfr.setDepth(2)
cfr.setSortBy(SortBy.VALUE)
facetSearchParams.addFacetRequest(cfr)
FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxo)
def cp = new CategoryPath("Category~Category1", (char)'~')
searcher.search(DrillDown.query(q, cp), MultiCollector.wrap(tdc, facetsCollector))
The results always return a list of facets in the form of "Category/Category1".
I have used the Luke tool to look at the index and it appears the facets are being delimited by the '~' character in the index.
What is the best route to do this? Any help is greatly appreciated!
I have figured out the issue. The search and indexing are working as they are supposed to. It is how I have been getting the facet results that is the issue. I was using :
res = facetsCollector.getFacetResults()
res.each{ result ->
result.getFacetResultNode().getLabel().toString()
}
What I needed to use was :
res = facetsCollector.getFacetResults()
res.each{ result ->
result.getFacetResultNode().getLabel().toString((char)'~')
}
The difference being the paramter sent to the toString function!
Easy to overlook, tough to find.
Hope this helps others.

Categories