Querying Hibernate Search with a logic pattern

Querying Hibernate Search with a logic pattern - java

I'm integrating Hibernate Search in my project and at the moment it works fine.
Now I want to refine my search in this way: basically I'd like to pass, as a user, a query like term1 AND term2 OR term3 and so on. The number of terms could be different of course.
So my idea is to build a proper search with logical operators to help the users to find what they want to.

You have to separate your conditions which are using AND and OR by using ().
e.g.
(term1 AND term2) OR term3
If you are again wanted to use some term the it should be like ((term1 AND term2) OR term3) AND term4 like this.....

You can use this stackoverflow answer if you have one entity.
You can use a boolean query like :
Query luceneQuery = b.bool()
.must(b.keyword().onField("fieldName").matching("term1").createQuery())
.must(b.keyword().onField("fieldName").matching("term2").createQuery())
.should(b.keyword().onField("fieldName").matching("term3").createQuery())
.except(b.keyword().onField("fieldName").matching("term4").createQuery())
.createQuery();
must : the query must much this term (like AND).
should : the query should this query (like OR).
except : to exclude the document that contains this term (like NOT).

Related

How to handle synonyms and stop words when building a fuzzy query with Hibernate Search Query DSL

Using Hibernate Search (5.8.2.Final) Query DSL to Elasticsearch server.
Given a field analyzer that does lowercase, standard stop-words, then a custom synonym with:
company => co
and finally, a custom stop-word:
co
And we've indexed a vendor name: Great Spaulding Company, which boils down to 2 terms in Elasticsearch after synonyms and stop-words: great and spaulding.
I'm trying to build my query so that each term 'must' match, fuzzy or exact, depending on the term length.
I get the results I want except when 1 of the terms happens to be a synonym or stop-word and long enough that my code adds fuzziness to it, like company~1, in which case, it is no longer seen as a synonym or stop-word and my query returns no match, since 'company' was never stored in the first place b/c it becomes 'co' and then removed as a stop word.
Time for some code. It may seem a bit hacky, but I've tried numerous ways and using simpleQueryString with withAndAsDefaultOperator and building my own phrase seems to get me the closest to the results I need (but I'm open to suggestions). I'm doing something like:
// assume passed in search String of "Great Spaulding Company"
String vendorName = "Great Spaulding Company";
List<String> vendorNameTerms = Arrays.asList(vendorName.split(" "));
List<String> qualifiedTerms = Lists.newArrayList();
vendorNameTerms.forEach(term -> {
int editDistance = getEditDistance(term); // 1..5 = 0, 6..10 = 1, > 10 = 2
int prefixLength = getPrefixLength(term); //appears of no use with simpleQueryString
String fuzzyMarker = editDistance > 0 ? "~" + editDistance : "";
qualifiedTerms.add(String.format("%s%s", term, fuzzyMarker));
});
// join my terms back together with their optional fuzziness marker
String phrase = qualifiedTerms.stream().collect(Collectors.joining(" "));
bool.should(
qb.simpleQueryString()
.onField("vendorNames.vendorName")
.withAndAsDefaultOperator()
.matching(phrase)
.createQuery()
);
As I said above, I'm finding that as long as I don't add any fuzziness to a possible synonym or stop-word, the query finds a match. So these phrases return a match:
"Great Spaulding~1" or "Great Spaulding~1 Co" or "Spaulding Co"
But since my code doesn't know what terms are synonyms or stop-words, it blindly looks at term length and says, oh, 'Company' is greater than 5 characters, I'll make it fuzzy, it builds these sorts of phrases which are NOT returning a match:
"Great Spaulding~1 Company~1" or "Great Company~1"
Why is Elasticsearch not processing Company~1 as a synonym?
Any idea on how I can make this work with simpleQueryString or
another DSL query?
How is everyone handling fuzzy searching on text that may contain stopwords?
[Edit] Same issue happens with punctuation that my analyzer would normally remove. I cannot include any punctuation in the fuzzy search string in my query b/c the ES analyzer doesn't seem to treat it as it would non-fuzzy and I don't get a match result.
Example based on above search string: Great Spaulding Company., gets built in my code to the phrase Great Spaulding~1 Company.,~1 and ES doesn't remove the punctuation or recognize the synonym word Company
I'm going to try a hack of calling ES _analyze REST api in order for it to tell me what tokens I should include in the query, although this will add overhead to every query I build. Similar to http://localhost:9200/myEntity/_analyze?analyzer=vendorNameAnalyzer&text=Great Spaulding Company., produces 3 tokens: great, spaulding and company.

Why is Elasticsearch not processing Company~1 as a synonym?
I'm going to guess it's because fuzzy queries are "term-level" queries, which means they operate on exact terms instead of analyzed text. If your term, once analyzed, resolved to multiple tokens, I don't think it would be easy to define an acceptable behavior for a fuzzy queries.
There's a more detailed explanation there (I believe it still applies to the Lucene version used in Elasticsearch 5.6).
Any idea on how I can make this work with simpleQueryString or another DSL query?
How is everyone handling fuzzy searching on text that may contain stopwords?
You could try reversing your synonym: use co => company instead of company => co, so that a query such as compayn~1 will match even if "compayn" is not analyzed. But that's not a satisfying solution, of course, since other example requiring analysis still won't work, such as Company~1.
Below are alternative solutions.
Solution 1: "match" query with fuzziness
This article describes a way to perform fuzzy searches, and in particular explains the difference between several types of fuzzy queries.
Unfortunately it seems that fuzzy queries in "simple query string" queries are translated in the type of query that does not perform analysis.
However, depending on your requirements, the "match" query may be enough. In order to access all the settings provided by Elasticsearch, you will have to fall back to native query building:
QueryDescriptor query = ElasticsearchQueries.fromJson(
"{ 'query': {"
+ "'match' : {"
+ "'vendorNames.vendorName': {"
// Not that using a proper JSON framework would be better here, to avoid problems with quotes in the terms
+ "'query': '" + userProvidedTerms + "',"
+ "'operator': 'and',"
+ "'fuzziness': 'AUTO'"
+ "}"
+ "}"
+ " } }"
);
List<?> result = session.createFullTextQuery( query ).list();
See this page for details about what "AUTO" means in the above example.
Note that until Hibernate Search 6 is released, you can't mix native queries like shown above with the Hibernate Search DSL. Either you use the DSL, or native queries, but not both in the same query.
Solution 2: ngrams
In my opinion, your best bet when the queries originate from your users, and those users are not Lucene experts, is to avoid parsing the queries altogether. Query parsing involves (at least in part) text analysis, and text analysis is best left to Lucene/Elasticsearch.
Then all you can do is configure the analyzers.
One way to add "fuzziness" with these tools would be to use an NGram filter. With min_gram = 3 and max_gram = 3, for example:
An indexed string such as "company" would be indexed as ["com", "omp", "mpa", "pan", "any"]
A query such as "compayn", once analyzed, would be translated to (essentially com OR omp OR mpa OR pay OR ayn
Such a query would potentially match a lot of documents, but when sorting by score, the document for "Great Spaulding Company" would come up to the top, because it matches almost all of the ngrams.
I used parameter values min_gram = 3 and max_gram = 3 for the example, but in a real world application something like min_gram = 3 and max_gram = 5 would work better, since the added, longer ngrams would give a better score to search terms that match a longer part of the indexed terms.
Of course if you can't sort by score, of if you can't accept too many trailing partial matches in the results, then this solution won't work for you.

Using Apache Solr's boost query function with Spring in Java

I'm writing a Java application that is using Apache Solr to index and search through a list of articles. A requirement I am dealing with is that when a user searches for something, we are supplying a list of recommended related search terms, and the user has the option to include those extra terms in their search. The problem I'm having, however, is that we want the user's original search term to be prioritized, and results that match that should appear before results that only match related terms.
My research suggests that Solr's boost function is the solution for this, but I'm having some trouble getting it to work with Spring. The code all runs fine and I get my search results as expected, but the boost function doesn't seem to actually be re-ordering my searches at all. For example, I'm trying to do something like this:
Query query = new SimpleQuery();
Criteria searchCriteria = Criteria.where("title").contains("A").boost((float) 2);
Criteria extraCriteria = Criteria.where("title").contains("B").boost((float) 1);
query.addCriteria(searchCriteria.or(extraCriteria));
In this example I would be searching for any document whose title contains "A" or "B", but I want to boost results that match "A" to the top of the list.
I've also tried using the Extended DisMax Query Parser with a different syntax to achieve the same result, with similar lack of success. To follow the same example pattern, I'm trying to use the expression criteria as follows:
Query query = new SimpleQuery();
Criteria searchCriteria = Criteria.where("title").expression("A^2.0 OR B^1.0");
query.setDefType("edismax");
query.addCriteria(searchCriteria);
Again I would expect this to return documents with titles matching "A" or "B" but boost results matching "A", and again it simply doesn't seem to actually affect the ordering of my results at all.

Okay, I figured out the problem here. Elsewhere in the code someone else had added this snippet:
query.setPageRequest(pageable);
This was done to support pagination of the search results, but the pageable object ALSO contained some sort orders that looks like they got added to the query as part of the .setPageRequest method. Something to look out for in the future, it looks like sorts override boosting when working with Spring Solr queries in this scenario.

IN Equivalent Query In Solr and Solrj

I am using solr5.0.0. I would like to know the equivalent query for
IN in solr or solrj.
If I need to query products of different brands, I can use IN clause. If I have brands like dell, sony, samsung. I need to find the product with these brands using Solr and in Java Solrj.
Now I am using this code in Solrj
qry.addFilterQuery("brand:dell OR brand:sony OR brand:samsung");
I know that I can use OR here, but need to know about IN in Solr. And the performance of OR.

As you can read in Solr's wiki about its' query syntax, Solr uses per default a superset of Lucene's Query parser. As you can see when reading both documents, something like IN does not exist. But you can get shorter than the example query you presented.
In case that your default operator is OR you can leave it out from the query. In addition you can make use of Field Grouping.
qry.addFilterQuery("brand:(dell sony samsung)");
In case OR is not your default operator or you are not sure about this, you can employ Local Parameters for the filter query so that OR is enforced. Afterwards you can again make use of Field Grouping.
qry.addFilterQuery("{!q.op=OR}brand:(dell sony samsung)");
Keep in mind that you need to surround a phrase with " to keep the words together
qry.addFilterQuery("{!q.op=OR}brand:(dell sony samsung \"packard bell\")");

GAE - Java - Best way to do a query filter "LIKE"

In my GAE Datastore I have "Person" Entity with name, surname and country
I need to do a query like
"SELECT * FROM Country WHERE name LIKE '%spa%'"
This answer offers a solution like this:
Query query = new Query("Person");
query.addFilter("name", FilterOperator.GREATER_THAN_OR_EQUAL, "pe");
query.addFilter("name", FilterOperator.LESS_THAN, "pe"+ "\uFFFD");
But I don't have any success, always return 0 results... I'm missing something?
It seems that another alternative is useing the "Search API", but... How I migrate all my data of "Persons" in my Datastore to a new Document to do the search?
Any solutions?
Thanks

That answer is not the same as your question. The query they provide is a prefix query: ie all names that start with "pe". You seem to want a query for all names which contain "pe" anywhere, which is not possible for the reasons explained in the accepted answer to that question.
The Search API is indeed the answer to doing this, and the details of how to create documents to represent your datastore objects are contained in the link you posted. (Note this isn't a migration: your data should stay in the datastore, the Search API is a separate system used only for full-text search.)

HibernateSearch query

I am new to hibernate Search and i find difficulty in forming Hibernateserach query.
I need to use IN opeartor to the List the String in Hibernate query .
Can anybody help me to sort out this issue.
My current query look like this
String querystring="country:"+profile.getCountry()+" AND religion:"+profile.getReligion()+" AND caste:"+profile.getCaste()+" AND gender:"+profile.getGender()+"AND profession : "+professions+" AND age:["+profile.getFromage()+" TO "+profile.getToage()+"]" ;
here is professions is a list of string.
Regards,
Arun

There is no IN operator in Lucene query language. You will have to expand the string yourself. An alternative for using the query parser would be to use a Lucene BooleanQuery and add the different parts of your query to it, for example a RangeQuery etc. Effectively the QueryParser creates under the hood this lower level queries for you. Have a look at the Lucene API and the different sub classes of org.apache.lucene.search.Query. You still have to expand the collection string yourself though.
Last but not least, you could use the Hibernate Search query DSL. Have a look at the online docs of Hibernate Search if you want to know more - http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#search-query-querydsl

You need to add the following clauses to your query SELECT, FROM, and WHERE. Also the conditions are missing parts. For example here is a valid query. "SELECT e from Employee where e.country = :country and e.religion = :religion"...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.