Determine which parameter failed in a Lucene BooleanQuery? - java

I need to determine which part of a Lucene BooleanQuery failed if the entire query returns no results.
I'm using a BooleanQuery made up of 4 NumericRangeQueries and a PhraseQuery. Each is added to the query with Occur.MUST.
If I don't get any results for a query, is there a way to tell which part of the query failed to match anything? Do I need to run queries individually and compare results to get the one that failed?
Edit - Added PhraseQuery code.
if( row.getPropertykey_tx() != null && !row.getPropertykey_tx().trim().isEmpty()){
PhraseQuery pQuery = new PhraseQuery();
String[] words = row.getPropertykey_tx().trim().split(" ");
for( String word : words ){
pQuery.add(new Term(TitleRecordColumns.SA_SITE_ADDR.toString(), word));
}
pQuery.setSlop(2);
topBQuery.add(pQuery, BooleanClause.Occur.MUST);
}

Running individual parts of the query is probably the simplest approach, to my mind.
Another tool available is the getting an Explaination. You can call IndexSearcher.explain to get an Explanation of the scoring for the query against a particular document. If you can provide the docid of a document you believe should match the query, you can analyze Explanation.toString (or toHtml, if you prefer) to determine which subqueries are not matching against it.
If you want to automatically keep a record of which clause of a BooleanQuery doesn't produce results, I believe you will need to run each query independantly. If you no longer have access to the subqueries used to create it, you can get the clauses of it instead:
findTroublesomeQuery(BooleanQuery query) {
for (BooleanClause clause : query.clauses()) {
Query subquery = clause.getQuery()
TopDocs docs = searchHoweverYouDo(subquery);
if (doc.totalSize == 0) {
//If you want to dig down recursively...
if (subquery instanceof BooleanQuery)
findTroublesomeQuery(query);
else
log(query); //Or do whatever you want to keep track of it.
}
}
}
DisjunctionMaxQuery is a commonly used query that wraps multiple subqueries as well, so might be worth considering for this sort of approach.

Related

when is criteria better than HQL or nativeSQL Query?

In my Case after getting a certain list I need to to iterate that list to set some other fields of the POJO class.
if (transportHeaderList.get(i) instanceof TransportHeaderIiss){
transHeadIiss=(TransportHeaderIiss)transportHeaderList.get(i);
customerVendor= tOManagementDAO.getVendorCode(transHeadIiss.getCustVendUid());
}
if(customerVendor!=null){
transHeadIiss.setVendorCode(customerVendor.getCustVendCode());
}
The Above code calls getVendorCode method to get custVendorCode value from the database. The code for getVendorCode is as follows
public CustomerVendorIiss getVendorCode(Long custVendUid) {
List list=new ArrayList();
/* Criteria criteria = sessionFactory.getCurrentSession().createCriteria(CustomerVendorIiss.class);
criteria.add(Restrictions.eq("companyCode",user.getDefaultCompany().getCompanyCode()));
if(custVendUid!=null && custVendUid.intValue()>0)
{
criteria.add(Restrictions.eq("custVendUid",custVendUid));
}
list=criteria.list();*/
UsersIiss user= ApplicationContextProvider.getLoggedInUser();
String sqlQuery="select custVendCode as custVendCode from CustomerVendorIiss where companyCode ='"+ user.getDefaultCompany().getCompanyCode() +"' and custVendUid= "+custVendUid;
Query query = sessionFactory.getCurrentSession().createQuery(sqlQuery);
query.setResultTransformer(Transformers.aliasToBean(CustomerVendorIiss.class));
list=query.list();
if(list.size()>0){
return (CustomerVendorIiss)list.get(0);
}else{
return null;
}
}
When I executed above code with criteria, it took a lot time to get the values from table and set it to POJO class and sometimes I would get java.lang.OutOfMemoryError: Java heap space error . I guess that's because I am not de-allocating the criteria object.
when I executed the above code using createQuery() method I did not run into that issue and all that process of getting and setting was faster.
I want to understand what is that I am doing wrong here?
it would be great to know how and when criteria is better or HQL is better ?
Thank you !!
Actually these queries are different. The second one has an additional restriction
companyCode ='"+ user.getDefaultCompany().getCompanyCode() +"'
So try to add the same to the criteria
criteria.add(Restrictions.eq("companyCode",user.getDefaultCompany().getCompanyCode()));
Also it's not god to concate strings this way to get the query. SQL injection is possible. Use parameters instead.
Criteria and HQL is better than SQL in one case - you need DB independent logic to swap DB when necessary without rewriting code.

MongoDB - How to get the count for a find query

I cannot for the life of me find out how to get a count for a find query using the java driver in mongo db. Can someone please put me out of my misery?
I have the following:
MongoCursor<Document> findRes = collection.find().iterator();
But there is no count method that I can find anywhere.
public Long getTotalCount(String collectionName, Document filterDocument) {
MongoCollection collection = database.getCollection(collectionName);
return filterDocument != null ? collection.count(filterDocument) : collection.count();
}
Where filterDocument is org.bson.Document with filter criterias or null if you want to get total count
You may also use more powerful Filters class. Example: collection.count(Filters.and(Filters.eq("field","value"),second condition and so on));
So, in order to be able to take both Document and Filters as param you may change signature to public Long getTotalCount(String collectionName, Bson filterDocument) {
long rows = db.getCollection(myCollection).count(new Document("_id", 10)) ;
this is in Java, myCollection is collection name.
MongoDB has inbuilt method count() that can be called on cursor to find the number of documents returned.
I tried following piece of code in mongodb, that worked well, can be easily applied in java or any other language too:
var findres = db.c.find()
findres.count() gave output 29353
cursor.count() is what you're looking for I believe. Your find query returns a Cursor so you can just call count() on that.

How can I use ExecutionEngine to get a list of relationships in neo4j?

I'musing ExecutionEngine in java to run cypher queries against a neo4j database.
I'd like to get all relationships that exist for a node.
My raw cypher would be:
MATCH (n:Phone{id:'you'}) MATCH n-[r:calling]->m WHERE n<>m RETURN n, r, m
I see plenty of example online that describe how I can get the results of noes from a query, but I'd like to return both the nodes n and m as well as relationship r.
Do I need to do anything different than if I was just returning nodes?
Here's how you execute cypher queries from Java.
Here's some code for how you'd get those relationships. I haven't tested this, but it's the right general approach.
String query = "MATCH (n:Phone{id:'you'}) MATCH n-[r:calling]->m WHERE n<>m RETURN n, r, m";
ExecutionEngine engine = new ExecutionEngine( db );
ExecutionResult result;
try ( Transaction ignored = db.beginTx() ) {
result = engine.execute(query);
ResourceIterator<Relationship> rels = result.columnAs("r");
while(rels.hasNext()) {
Relationship r = rels.next();
// Do something cool here.
}
} catch(Exception exc) { System.err.println("ERHMAGEHRD!!!"); }
Basically, use the columnAs() method to get a result column. Note that here it's "r" because your query is returning relationships into a variable that name.
OK, now for your question about the query. In java, I like to return as little as possible from queries. If you need it, it should be in the return clause. If you don't, then it shouldn't be.
If you want the relationships, then return them. Don't try to get at relationships by returning the nodes, then looking from there. That approach will work, but just going straight for the relationships makes more sense.

Fuzzy search on a phrase using Hibernate Search

I am using Hibernate Search to search for titles of tv shows on my web app.
I can use the method fuzzy() on keyword() in order to perfom fuzzy searches on keywords, but I need to take into account the whole title, so I am using phrase() instead of keyword(). The method fuzzy() is not defined for phrase(), so I was wondering if there is an easy way to achieve fuzzy searches on phrases using Hibernate Search.
If you just need a PhraseQuery with slop (that is, extra words thrown in), then you can set the slop on a phrase query like:
queryBuilder.phrase()
.setSlop(2)
.onField("myField")
.sentance("this sentence missing something")
.createQuery();
However, I'm not aware of anything in the Hibernate APIs that supports embedding fuzzy queries in phrases, but in Lucene, you can work with the SpanQuery API to build that. SpanMultiTermQueryWrapper and SpanNearQuery, in particular, are what you would need. Something like:
FuzzyQuery query1 = new FuzzyQuery(new Term("field", "fuzy"));
FuzzyQuery query2 = new FuzzyQuery(new Term("field", "phrse"));
Query wrappedQuery1 = new SpanMultiTermQueryWrapper<FuzzyQuery>(query1);
Query wrappedQuery2 = new SpanMultiTermQueryWrapper<FuzzyQuery>(query2);
SpanQuery[] clauses = {wrappedQuery1, wrappedQuery2};
SpanNearQuery(clauses, 0, true);

How do I use boolean operators with Hibernate Search

I'm learning the Hibernate Search Query DSL, and I'm not sure how to construct queries using boolean arguments such as AND or OR.
For example, let's say that I want to return all person records that have a firstName value of "bill" or "bob".
Following the hibernate docs, one example uses the bool() method w/ two subqueries, such as:
QueryBuilder b = fts.getSearchFactory().buildQueryBuilder().forEntity(Person.class).get();
Query luceneQuery = b.bool()
.should(b.keyword().onField("firstName").matching("bill").createQuery())
.should(b.keyword().onField("firstName").matching("bob").createQuery())
.createQuery();
logger.debug("query 1:{}", luceneQuery.toString());
This ultimately produces the lucene query that I want, but is this the proper way to use boolean logic with hibernate search? Is "should()" the equivalent of "OR" (similarly, does "must()" correspond to "AND")?.
Also, writing a query this way feels cumbersome. For example, what if I had a collection of firstNames to match against? Is this type of query a good match for the DSL in the first place?
Yes your example is correct. The boolean operators are called should instead of OR because of the names they have in the Lucene API and documentation, and because it is more appropriate: it is not only influencing a boolean decision, but it also affects scoring of the result.
For example if you search for cars "of brand Fiat" OR "blue", the cars branded Fiat AND blue will also be returned and having an higher score than those which are blue but not Fiat.
It might feel cumbersome because it's programmatic and provides many detailed options. A simpler alternative is to use a simple string for your query and use the QueryParser to create the query. Generally the parser is useful to parse user input, the programmatic one is easier to deal with well defined fields; for example if you have the collection you mentioned it's easy to build it in a for loop.
You can also use BooleanQuery. I would prefer this beacuse You can use this in loop of a list.
org.hibernate.search.FullTextQuery hibque = null;
org.apache.lucene.search.BooleanQuery bquery = new BooleanQuery();
QueryBuilder qb = fulltextsession.getSearchFactory().buildQueryBuilder()
.forEntity(entity.getClass()).get();
for (String keyword : list) {
bquery.add(qb.keyword().wildcard().onField(entityColumn).matching(keyword)
.createQuery() , BooleanClause.Occur.SHOULD);
}
if (!filterColumn.equals("") && !filterValue.equals("")) {
bquery.add(qb.keyword().wildcard().onField(column).matching(value).createQuery()
, BooleanClause.Occur.MUST);
}
hibque = fulltextsession.createFullTextQuery(bquery, entity.getClass());
int num = hibque.getResultSize();
To answer you secondary question:
For example, what if I had a collection of firstNames to match against?
I'm not an expert, but according to (the third example from the end of) 5.1.2.1. Keyword queries in Hibernate Search Documentation, you should be able to build the query like so:
Collection<String> namesCollection = getNames(); // Contains "billy" and "bob", for example
StringBuilder names = new StringBuilder(100);
for(String name : namesCollection) {
names.append(name).append(" "); // Never mind the space at the end of the resulting string.
}
QueryBuilder b = fts.getSearchFactory().buildQueryBuilder().forEntity(Person.class).get();
Query luceneQuery = b.bool()
.should(
// Searches for multiple possible values in the same field
b.keyword().onField("firstName").matching( sb.toString() ).createQuery()
)
.must(b.keyword().onField("lastName").matching("thornton").createQuery())
.createQuery();
and, have as a result, Persons with (firstName preferably "billy" or "bob") AND (lastName = "thornton"), although I don't think it will give the good ol' Billy Bob Thornton a higher score ;-).
I was looking for the same issue and have a somewhat different issue than presented. I was looking for an actual OR junction. The should case didn't work for me, as results that didn't pass any of the two expressions, but with a lower score. I wanted to completely omit these results. You can however create an actual boolean OR expression, using a separate boolean expression for which you disable scoring:
val booleanQuery = cb.bool();
val packSizeSubQuery = cb.bool();
packSizes.stream().map(packSize -> cb.phrase()
.onField(LUCENE_FIELD_PACK_SIZES)
.sentence(packSize.name())
.createQuery())
.forEach(packSizeSubQuery::should);
booleanQuery.must(packSizeSubQuery.createQuery()).disableScoring();
fullTextEntityManager.createFullTextQuery(booleanQuery.createQuery(), Product.class)
return persistenceQuery.getResultList();

Categories