How to do a Multi field - Phrase search in Lucene? - java

Title asks it all... I want to do a multi field - phrase search in Lucene.. How to do it ?
for example :
I have fields as String s[] = {"title","author","content"};
I want to search harry potter across all fields.. How do I do it ?
Can someone please provide an example snippet ?

Use MultiFieldQueryParser, its a QueryParser which constructs queries to search multiple fields..
Other way is to use Create a BooleanQuery consisting of TermQurey (in your case phrase query).
Third way is to include the content of other fields into your default content field.
Add
Generally speaking, querying on multiple fields isn’t the best practice for user-entered queries. More commonly, all words you want searched are indexed into a contents or keywords field by combining various fields.
Update
Usage:
Query query = MultiFieldQueryParser.parse(Version.LUCENE_30, new String[] {"harry potter","harry potter","harry potter"}, new String[] {"title","author","content"},new SimpleAnalyzer());
IndexSearcher searcher = new IndexSearcher(...);
Hits hits = searcher.search(query);
The MultiFieldQueryParser will resolve the query in this way: (See javadoc)
Parses a query which searches on the
fields specified. If x fields are
specified, this effectively
constructs:
(field1:query1) (field2:query2)
(field3:query3)...(fieldx:queryx)
Hope this helps.

intensified googling revealed this :
http://lucene.472066.n3.nabble.com/Phrase-query-on-multiple-fields-td2292312.html.
Since it is latest and best, I'll go with his approach I guess.. Nevertheless, it might help someone who is looking for something like I am...

You need to use MultiFieldQueryParser with escaped string. I have tested it with Lucene 8.8.1 and it's working like magic.
String queryStr = "harry potter";
queryStr = "\"" + queryStr.trim() + "\"";
Query query = new MultiFieldQueryParser(new String[]{"title","author","content"}, new StandardAnalyzer()).parse(queryStr);
System.out.println(query);
It will print.
(title:"harry potter") (author:"harry potter") (content:"harry potter")

Related

Fuzzy search on a phrase using Hibernate Search

I am using Hibernate Search to search for titles of tv shows on my web app.
I can use the method fuzzy() on keyword() in order to perfom fuzzy searches on keywords, but I need to take into account the whole title, so I am using phrase() instead of keyword(). The method fuzzy() is not defined for phrase(), so I was wondering if there is an easy way to achieve fuzzy searches on phrases using Hibernate Search.
If you just need a PhraseQuery with slop (that is, extra words thrown in), then you can set the slop on a phrase query like:
queryBuilder.phrase()
.setSlop(2)
.onField("myField")
.sentance("this sentence missing something")
.createQuery();
However, I'm not aware of anything in the Hibernate APIs that supports embedding fuzzy queries in phrases, but in Lucene, you can work with the SpanQuery API to build that. SpanMultiTermQueryWrapper and SpanNearQuery, in particular, are what you would need. Something like:
FuzzyQuery query1 = new FuzzyQuery(new Term("field", "fuzy"));
FuzzyQuery query2 = new FuzzyQuery(new Term("field", "phrse"));
Query wrappedQuery1 = new SpanMultiTermQueryWrapper<FuzzyQuery>(query1);
Query wrappedQuery2 = new SpanMultiTermQueryWrapper<FuzzyQuery>(query2);
SpanQuery[] clauses = {wrappedQuery1, wrappedQuery2};
SpanNearQuery(clauses, 0, true);

Hibernate Query To Search Content in a Forum Questions

I want to search the contents in a forum especially forum questions
for example:
searchString = "Hibernate Session Configuration";
will give corresponding details in the Forum Questions
but all the words need not to be consecutive in the forum content and so i am storing the searching string in a java.util.Set containing each word
String[] searchArray= searchString.toLowerCase().split(" ");
Set<String> searchSet = new HashSet<String>();
// searchSet contains words of searchString
for(String string : searchArray){
searchSet.add(string);
}
I written hibernate query as,
DetachedCriteria detachedCriteria = DetachedCriteria.forClass(ForumQuestion.class);
for(String searchUnitString : searchSet)
{
detachedCriteria= detachedCriteria.add(Restrictions.disjunction().add(Restrictions.ilike("forumQuestion", "%"+searchUnitString+"%")));
}
return template.findByCriteria(detachedCriteria);
But this query is not working properly.. it just take the last Restrictions ignoring the previous Restrictions!
In this example, it will consider only for '%Configuration%' but my need is '%Hibernate%' or '%Session%' or '%Configuratoin%' together
Note: if I query for each word, then database hit will be high
You're not adding a disjunction. You're adding N disjunctions containing only one restriction each. The code should be:
DetachedCriteria detachedCriteria = DetachedCriteria.forClass(ForumQuestion.class);
Disjunction disjunction = Restrictions.disjunction();
for (String searchUnitString : searchSet) {
disjunction.add(Restrictions.ilike("forumQuestion", "%"+searchUnitString+"%"));
}
detachedCriteria.add(disjunction);
return template.findByCriteria(detachedCriteria);
Note that unless you have few questions in your forum, these searches will be slow. SQL queries are not the best way to handle full text search. I would look at Lucene (that Hibernate Search uses, BTW) for such a task.

How do I use boolean operators with Hibernate Search

I'm learning the Hibernate Search Query DSL, and I'm not sure how to construct queries using boolean arguments such as AND or OR.
For example, let's say that I want to return all person records that have a firstName value of "bill" or "bob".
Following the hibernate docs, one example uses the bool() method w/ two subqueries, such as:
QueryBuilder b = fts.getSearchFactory().buildQueryBuilder().forEntity(Person.class).get();
Query luceneQuery = b.bool()
.should(b.keyword().onField("firstName").matching("bill").createQuery())
.should(b.keyword().onField("firstName").matching("bob").createQuery())
.createQuery();
logger.debug("query 1:{}", luceneQuery.toString());
This ultimately produces the lucene query that I want, but is this the proper way to use boolean logic with hibernate search? Is "should()" the equivalent of "OR" (similarly, does "must()" correspond to "AND")?.
Also, writing a query this way feels cumbersome. For example, what if I had a collection of firstNames to match against? Is this type of query a good match for the DSL in the first place?
Yes your example is correct. The boolean operators are called should instead of OR because of the names they have in the Lucene API and documentation, and because it is more appropriate: it is not only influencing a boolean decision, but it also affects scoring of the result.
For example if you search for cars "of brand Fiat" OR "blue", the cars branded Fiat AND blue will also be returned and having an higher score than those which are blue but not Fiat.
It might feel cumbersome because it's programmatic and provides many detailed options. A simpler alternative is to use a simple string for your query and use the QueryParser to create the query. Generally the parser is useful to parse user input, the programmatic one is easier to deal with well defined fields; for example if you have the collection you mentioned it's easy to build it in a for loop.
You can also use BooleanQuery. I would prefer this beacuse You can use this in loop of a list.
org.hibernate.search.FullTextQuery hibque = null;
org.apache.lucene.search.BooleanQuery bquery = new BooleanQuery();
QueryBuilder qb = fulltextsession.getSearchFactory().buildQueryBuilder()
.forEntity(entity.getClass()).get();
for (String keyword : list) {
bquery.add(qb.keyword().wildcard().onField(entityColumn).matching(keyword)
.createQuery() , BooleanClause.Occur.SHOULD);
}
if (!filterColumn.equals("") && !filterValue.equals("")) {
bquery.add(qb.keyword().wildcard().onField(column).matching(value).createQuery()
, BooleanClause.Occur.MUST);
}
hibque = fulltextsession.createFullTextQuery(bquery, entity.getClass());
int num = hibque.getResultSize();
To answer you secondary question:
For example, what if I had a collection of firstNames to match against?
I'm not an expert, but according to (the third example from the end of) 5.1.2.1. Keyword queries in Hibernate Search Documentation, you should be able to build the query like so:
Collection<String> namesCollection = getNames(); // Contains "billy" and "bob", for example
StringBuilder names = new StringBuilder(100);
for(String name : namesCollection) {
names.append(name).append(" "); // Never mind the space at the end of the resulting string.
}
QueryBuilder b = fts.getSearchFactory().buildQueryBuilder().forEntity(Person.class).get();
Query luceneQuery = b.bool()
.should(
// Searches for multiple possible values in the same field
b.keyword().onField("firstName").matching( sb.toString() ).createQuery()
)
.must(b.keyword().onField("lastName").matching("thornton").createQuery())
.createQuery();
and, have as a result, Persons with (firstName preferably "billy" or "bob") AND (lastName = "thornton"), although I don't think it will give the good ol' Billy Bob Thornton a higher score ;-).
I was looking for the same issue and have a somewhat different issue than presented. I was looking for an actual OR junction. The should case didn't work for me, as results that didn't pass any of the two expressions, but with a lower score. I wanted to completely omit these results. You can however create an actual boolean OR expression, using a separate boolean expression for which you disable scoring:
val booleanQuery = cb.bool();
val packSizeSubQuery = cb.bool();
packSizes.stream().map(packSize -> cb.phrase()
.onField(LUCENE_FIELD_PACK_SIZES)
.sentence(packSize.name())
.createQuery())
.forEach(packSizeSubQuery::should);
booleanQuery.must(packSizeSubQuery.createQuery()).disableScoring();
fullTextEntityManager.createFullTextQuery(booleanQuery.createQuery(), Product.class)
return persistenceQuery.getResultList();

Can I get the string from preparedStatem.setString()?

I have a problem - I create my SQL queries dynamically and basing on user input options. So the user has 5 parameters (actually it's more) and he can choose to use some of them (all if he wants) or none and specify their value in the query. So I construct my query String (basic the WHERE conditions) by checking if a parameter was selected and if a value was provided. However now there is the problem of special characters like '. I could try to use replaceAll("'", "\\") but this is quite dull and I know that preparedStatement.setString() does the job better. However for me I would need than to check again if the parameter was provided and if the previous one were also (to specify the poison of ? and connect it to the right parameter). This causes a lot of combinations and does not look elegant.
So my question is - can I somehow receive the string preparedStatement.setString() produces? Or is there a similar function that would do the same job and give me the String so I can put it in the query manually.
Maybe the intro was too long but someone might have a better idea and I wanted to explain why I need it.
What you can do is construct the basic, unparameterized SQL query based on whether the parameters were specified, and then use the prepared statement to fill in the parameters.
It could look something like this (rough sketch):
Map<String, Object> parameterValues = /*from user*/;
List<String> parameterNames = Arrays.asList("field1", "field2", "field3");
List<Object> valueList = new ArrayList<Object>();
StringBuilder statementBuilder = new StringBuilder("select * from table where ");
for ( String parameterName : parameterNames ) {
if ( parameterValues.containsKey(parameterName) ) {
statementBuilder.append(parameterName + " = ? AND");
valueList.add(parameterValues.get(parameterName));
}
}
PreparedStatement st = conn.prepareStatement(statementBuilder.toString(),
valueList);
//set each parameter here.
It's only hard the first time; then you can make it generic. That said there are probably query builders that abstract all of this away for you. I use QueryDSL but that does not have bindings for pure JDBC but rather JPA and JDO, etc.
On another forum I was given a different, simpler and cleaner approach that work perfectly.
Here are some links for others with the same problem:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1669972300346534908
http://www.akadia.com/services/dyn_modify_where_clause.html

Lucene: queries and docs with multiple fields

I have a collection of documents consisting of several fields, and I need to perform queries with several terms coming from multiple fields.
What do you suggest me to use ? MultiFieldQueryParser or MultiPhraseQuery ?
thanks
How about BooleanQuery?
http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/search/BooleanQuery.html
Choice of Analyzer
First of all, watch out which analyzer you are using. I was stumped for a while only to realise that the StandardAnalyzer filters out common words like 'the' and 'a'. This is a problem when your field has the value 'A'. You might want to consider the KeywordAnalyzer:
See this post around the analyzer.
// Create an analyzer:
// NOTE: We want the keyword analyzer so that it doesn't strip or alter any terms:
// In our example, the Standard Analyzer removes the term 'A' because it is a common English word.
// https://stackoverflow.com/a/9071806/231860
KeywordAnalyzer analyzer = new KeywordAnalyzer();
Query Parser
Next, you can either create your query using the QueryParser:
See this post around overriding the default operator.
// Create a query parser without a default field in this example (the first argument):
QueryParser queryParser = new QueryParser("", analyzer);
// Optionally, set the default operator to be AND (we leave it the default OR):
// https://stackoverflow.com/a/9084178/231860
// queryParser.setDefaultOperator(QueryParser.Operator.AND);
// Parse the query:
Query multiTermQuery = queryParser.parse("field_name1:\"field value 1\" AND field_name2:\"field value 2\"");
Query API
Or you can achieve the same by constructing the query yourself using their API:
See this tutorial around creating the BooleanQuery.
BooleanQuery multiTermQuery = new BooleanQuery();
multiTermQuery.add(new TermQuery(new Term("field_name1", "field value 1")), BooleanClause.Occur.MUST);
multiTermQuery.add(new TermQuery(new Term("field_name2", "field value 2")), BooleanClause.Occur.MUST);
Delete the Documents that Match the Query
Then we finally pass the query to the writer to delete documents that match the query:
See my answer here, related to this answer.
See the answer to this question.
// Remove the document by using a multi key query:
// http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
writer.deleteDocuments(multiTermQuery);

Categories