Hibernate Search java spring, searching only entities that have ids as specified

Hibernate Search java spring, searching only entities that have ids as specified - java

I have a problem when i want to search in entities that have specific ids. I have fullTextQuery that i execute, it works all fine, bud when i want to say
ONLY SEARCH IN THESE ENTITIES (List of ids provided) :
+(title:slovakia~2 leadText:slovakia~2 body:slovakia~2 software:slovakia~2) +verified:true +eid:(113 | 112 | 3)
Then i get 0 results, these entities are indexed and persisted, all should be working fine, yet it doesnt return any results.
Here is The entity property defined :
#Id
#GeneratedValue
#Field(name = "eid")
#FieldBridge(impl = LongBridge.class)
private long id;
I have tried, Without field bridge, with TermVector.YES and also without any additional #Field.. annotation. All results either exception or just no results.
What is a proper way of searching in specific IDs?
For instance here is the working query:
Creation of query looks like this :
return Optional.of(getQueryBuilder()
.keyword()
.onField("eid")
.matching(stringBuilder.toString())
.createQuery());

The syntax you tried to use, (113 | 112 | 3), is not correct in this context. Parameters to the keyword query are not interpreted, in particular operators are not supported.
Use a boolean junction that matches any of the provided IDs instead:
List<String> eids = ...;
QueryBuilder qb = getQueryBuilder();
BooleanJunction<?> idJunction = qb.bool();
for (String eid : eids) {
idJunction.should(
qb.keyword()
.onField("eid")
.matching(eid)
.createQuery()
);
}
return idJunction.createQuery();
Note that, if you want to add other queries, you should not use the same junction. Use another junction that includes idJunction.createQuery() as one of its clauses.

From the little experience i have had with hibernate-search, only Ranges seem to work well with intenger and long fields. In your example here, i expect the following query should work just fine:
QueryBuilder qb = getQueryBuilder();
BooleanJunction<?> idJunction = qb.bool();
bool.must(NumericRangeQuery.newLongRange("eid", Long.valueOf(eid), Long.valueOf(eid), true, true).createQuery();
In this case, the Boxed Long.valueOf() is optional if the values being supplied are Long values already.

Related

Implementing NULLS LAST with JPA CriteriaBuilder API

Spring/Hibernate/MySQL/JPA here. I have the following code:
public void setOrdering(
SearchRequest searchRequest,
CriteriaQuery query,
CriteriaBuilder builder,
Root<? extends MyEntity> root) {
String sortParam = "reportedOn";
Expression expression = builder.selectCase()
.when(builder.isNull(root.get(sortParam)), root.get(sortParam))
.otherwise(root.get(sortParam));
Order order = (searchRequest.isAscending())
? builder.asc(expression)
: builder.desc(expression);
query.orderBy(order);
}
Basically, I'm trying to implement the CriteriaBuilder/JPA equivalent of:
SELECT
*
FROM
mytable
WHERE
<lots of predicates here>
ORDER BY reported_on IS NULL, reported_on <ASC/DESC>
I already have the WHERE predicates added, I'm just struggling with the query.orderBy(...).
At runtime, when searchRequest.isAscending() is false, the results come back working just fine, with the records that contain a null reported_on value ordered at the end of the results.
But if searchRequest.isAscending() is true, the NULLS LAST attempt does not appear to work at all.

You're mixing up the Spring and JPA APIs, here query is from the JPA API so you need to sort using something like:
CriteriaBuilder cb = ...
Root root = ...
query.orderBy(cb.asc(root.get("reportedOn")));

It does not look like JPA's CriteriaBuilder supports NULLS LAST. I actually got this working using a SQL "hack":
String sortParam = "reportedOn";
Order order = (searchRequest.isAscending())
? builder.desc(builder.neg(root.get(sortParam)))
: builder.desc(root.get(sortParam));
query.orderBy(order);
Basically ORDER BY -reported_on DESC does the same thing as ORDER BY reported_on ASC but it sorts records with NULL reported_on values all the way to the bottom of the search results, which is what NULLS LAST is supposed to do.

Hibernate Search: Search any part of the field without losing field's content while indexing

I would like to be able to find an entity based on any part of its indexed fields, and the fields must not loose any content while indexing.
Lets say I have the following sample entity class:
#Entity
public class E {
private String f;
// ...
}
And if the value of f in one entity is "This is a nice field!", I would like to be able to find it by any of these queries:
"this"
"a"
"IC"
"!"
"This is a nice field!"
The most obvious decision is to annotate the entity this way:
#Entity
#Indexed
#AnalyzerDef(name = "a",
tokenizer = #TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = #TokenFilterDef(factory = LowerCaseFilterFactory.class)
)
#Analyzer(definition = "a")
public class E {
#Field
private String f;
// ...
}
And then search the following way:
String queryString;
// ...
org.apache.lucene.search.Query query = queryBuilder
.keyword()
.wildcard()
.onField("f")
.matching("*" + queryString.toLowerCase() + "*")
.createQuery();
But it is stated in the documentation that for performance purposes, it is recommended that the query does not start with either ? or *.
So as I understand, this method is ineffective.
The other idea is to use n-grams like this:
#Entity
#Indexed
#AnalyzerDef(name = "a",
tokenizer = #TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = NGramFilterFactory.class,
params = {
#Parameter(name = "minGramSize", value = "1"),
#Parameter(name = "maxGramSize", value = E.MAX_LENGTH)
})
}
)
#Analyzer(definition = "a")
public class E {
static final String MAX_LENGTH = "42";
#Field
private String f;
// ...
}
And create queries this way:
String queryString;
// ...
org.apache.lucene.search.Query query = queryBuilder
.keyword()
.onField("f")
.ignoreAnalyzer()
.matching(queryString.toLowerCase())
.createQuery();
This time no wildcard queries are used and the analyzer in the query is ignored. I'm not sure whether ignoring the analyzer is good or bad, but it works with analyzer ignored.
Other possible solution would be to use WhitespaceTokenizerFactory instead of KeywordTokenizerFactory when using n-grams, then split queryString by spaces and combine searches for each substring using MUST.
In this approach, as I understand, I will get a lot less n-grams built, if the length of the string contained in f is E.MAX_LENGTH, what must be good for performance. And I will also be able to find the previously described entity by, for example, "hi ield" query. And that would be ideal.
So what would be the best way to deal with my problem? Or are all my ideas bad?
P.S. Should one ignore analyzer in queries when using n-grams?

Other possible solution would be to use WhitespaceTokenizerFactory instead of KeywordTokenizerFactory when using n-grams, then split queryString by spaces and combine searches for each substring using MUST. In this approach, as I understand, I will get a lot less n-grams built, if the length of the string contained in f is E.MAX_LENGTH, what must be good for performance. And I will also be able to find the previously described entity by, for example, "hi ield" query. And that would be ideal.
This is more or less the ideal solution, except for one thing: you shouldn't ignore the analyzer when querying. What you should do is define another analyzer without the ngram filter, but with the tokenizer, lowercase filter, etc., and explicitly instruct Hibernate Search to use that analyzer at query time.
The other solutions are too expensive, either in I/O and CPU at query time (first solution) or in storage space (second solution). Note that this third solution may still be rather expensive in storage space, depending on the value of E.MAX_LENGTH. It's generally recommended to only have a difference of one or two between minGramSize and maxGramSize, to avoid the indexing of too many grams.
Just define another analyzer, name it something like "ngram_query", and when you need to build the query, create the query builder like this:
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(EPCAsset.class)
.overridesForField( "f" /* name of the field */, "ngram_query" )
.get();
Then create your query as usual.
Note that, if you rely on Hibernate Search to push the index schema and analyzers to Elasticsearch, you will have to use a hack in order for the query-only analyzer to be pushed: by default only the analyzers that are actually used during indexing are pushed. See https://discourse.hibernate.org/t/cannot-find-the-overridden-analyzer-when-using-overridesforfield/1043/4

Hibernate search on prefixes

Right now, I have successfully configured a basic Hibernate Search index to be able to search for full words on various fields of my JPA entity:
#Entity
#Indexed
class Talk {
#Field String title
#Field String summary
}
And my query looks something like this:
List<Talk> search(String text) {
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager)
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Talk).get()
Query query = queryBuilder
.keyword()
.onFields("title", "summary")
.matching(text)
.createQuery()
FullTextQuery jpaQuery = fullTextEntityManager.createFullTextQuery(query, Talk)
return jpaQuery.getResultList()
}
Now I would like to fine-tune this setup so that when I search for "test" it still finds talks where title or summary contains "test" even as the prefix of another word. So talks titled "unit testing", or whose summary contains "testicle" should still appear in the search results, not just talks whose title or summary contains "test" as a full word.
I've tried to look at the documentation, but I can't figure out if I should change something to the way my entity is indexed, or whether it has something to do with the query. Note that I wanted to do something like the following, but then it's hard to search on several fields:
Query query = queryBuilder
.keyword().wildcard()
.onField("title")
.matching(text + "*")
.createQuery()
EDIT:
Based on Hardy's answer, I configured my entity like so:
#Indexed
#Entity
#AnalyzerDefs([
#AnalyzerDef(name = "ngram",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = [
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = NGramFilterFactory.class,
params = [
#Parameter(name = "minGramSize",value = "3"),
#Parameter(name = "maxGramSize",value = "3")
])
])
])
class Talk {
#Field(analyzer=#Analyzer(definition="ngram")) String title
#Field(analyzer=#Analyzer(definition="ngram")) String summary
}
Thanks to that configuration, when I search for 'arti', I get Talks where title or summary contains words whose 'arti' is a subword of (artist, artisanal, etc.). Unfortunately, after those I also get Talks where title or summary contain words that contains subwords of my search term (arts, fart, etc.). There's probably some fine-tuning to eliminate those, but at least I get results sooner now, and they are in a sensible order.

There are multiple things you can do here. A lot can be done via the proper analyzing during index time.
For example, you want to apply a stemmer appropriate for your language. For English this is generally the Snowball stemmer.The idea is that during indexing all words are reduced to their stem, testing and tested to _test for example. This gets you a bit along your way.
The other thing you can look into is ngramm indexing. According to your description you want to find matching in unrelated words as well. The idea here is to index "subwords" of each words, so that they later can be found.
Regarding analyzers you want to look at the named analyzerssection of the Hibernate Search docs. The key here is the #AnalyzerDef annotation.
On the query side you can also apply some "tricks". Indeed you can use wildcard queries, however, if you are using the Hibernate Search query DSL, you cannot use a keyword query, but you need to use a wildcard query. Again, check the Hibernate Search docs.

You should use Ngram or EdgeNGram Filter for indexin as you correctly noted in your answer. But you should use different analyzer for your queries as suggested in lucene documentation (see search_analyzer):
https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_time_search_as_you_type.html
This way your search query wouldn't be tokenized to ngrams and your results would be more like %text% or text% in SQL.
Unfortunately for unknown reasons Hibernate Search currently doesn't support search_analyzer specification on fields. You can only specific analyzer for indexing, which would be also used for search query analysis.
I plan to implement this functionality myself.
EDIT:
You can specify search-time analyzer (search_analyzer) like this:
List<Talk> search(String text) {
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager)
EntityContext entityContext = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Talk);
entityContext.overridesForField("myField", "myNamedAnalyzerDef");
QueryBuilder queryBuilder = ec.get()
Query query = queryBuilder
.keyword()
.onFields("title", "summary")
.matching(text)
.createQuery()
FullTextQuery jpaQuery = fullTextEntityManager.createFullTextQuery(query, Talk)
return jpaQuery.getResultList()
}
I have used this technique to effectively simulate Lucene search_analyzer property.

In Lucene version 4.9 I used the EnglishAnalyzer for this. I think it is a English only implementation of the SnowballAnalyzer, but not 100% certain. I used it for both creating and searching the indexes. There is nothing special needed to use it.
Analyzer analyzer = new EnglishAnalyzer(Version.LUCENE_4_9);
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_4_9, analyzer);
and
analyzer = new EnglishAnalyzer(Version.LUCENE_4_9);
parser = new StandardQueryParser(analyzer);
You can see it in action at Guided Code Search. This runs exclusively off Lucene.
Lucene can be integrated into Hibernate searches, but I haven't yet tried to do that myself. I seems like it would be powerful, but I don't know: See Apache Lucene™ Integration.
I've also read that lucene can be patched into SQL engines, but I haven't tried that either. Example: Indexing Databases with Lucene.

Hibernate Criteria -- return records where column is distinct

Sample database table:
ID = 1, msgFrom = 'Hello', foobar = 'meh'
ID = 2, msgFrom = 'Goodbye', foobar = 'comments'
ID = 3, msgFrom = 'Hello', foobar = 'response'
Sample desired output (generated by hibernate query):
ID = 1, msgFrom = 'Hello', foobar = 'meh'
ID = 2, msgFrom = 'Goodbye', foobar = 'comments'
In the above example, the third record would be excluded from the results since the msgFrom column is the same. Let's say the Java/Hibernate class is called Message. I would like the results to be returned as a list of Message objects (or Objects that can be cast to Message, anyway). I want to use the Criteria API if possible. I saw this example on SO and it seems similar but I cannot implement it correctly as of yet.
select e from Message e
where e.msgFrom IN (select distinct m.msgFrom
from Message m
WHERE m.msgTo = ?
AND m.msgCheck = 0");
The reason I am doing this is to have the filtering of distinct records done on the database, so I am not interested in answers where I have to filter anything on the application server.
edit: Article showing basically what I want to do. http://oscarvalles.wordpress.com/2008/01/28/sql-distinct-on-one-column-only/

Please try this and let me know
DetachedCriteria msgFromCriteria = DetachedCriteria.forClass(Message.class);
ProjectionList properties = Projections.projectionList();
properties.add(Projections.groupProperty("messageFrom"));
properties.add(Projections.min("id"),"id");
msgFromCriteria.setProjection(properties);
Criteria criteria = s.createCriteria(Message.class);
criteria.add(Subqueries.propertiesIn(new String[]{"messageFrom","id"},
msgFromCriteria));
List<Message> list = criteria.list();
for(Message message:list){
System.out.println(message.getId()
+"-------"
+message.getMessageFrom()
+"-----"
+message.getFoobar());
}

The difficulty with this query is not so much with Hibernate, per se, but with the relational model in general. In the example, you say you expect rows 1 and 2, but why wouldn't you just as easily expect rows 2 and 3? It would be an arbitrary decision whether to return row 1 or row 3 since they both have the same value in the msgFrom field. Databases won't make arbitrary decisions like this. That's why distinct must be applied to the entire list of select columns, not a subset. There are database-specific ways of grabbing the first matching rows. For example, have a look at
SELECT DISTINCT on one column
Sometimes there will be a date column that you can use to decide which of the matching rows to return, but again the queries get somewhat complex:
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
Fetch the row which has the Max value for a column
If you don't care about any of the other columns, you can just use a simple distinct, combined with Hibernate's constructor syntax (not tested):
select new Message(msgFrom) from (select distinct msgFrom from Message)
but you have to accept throwing away all the other columns.
In the end, I often end up just doing this in code as a post query filter. Another option is to create a another table, say CurrentMessage, that includes msgFrom as part of the key. There will be more work in keeping this table up to date (you need to update a row everytime you add a row to the Message table) but querying will be much easier.

DetachedCriteria msgFromCriteria = DetachedCriteria.forClass(Message.class);
msgFromCriteria.setProjection(Projections.distinct(Projections.property("msgFrom")));
....
Criteria criteria = getSession().createCriteria(Message.class);
criteria.add(Subqueries.propertyIn("msgFrom", msgFromCriteria));
criteria.list();

How do I use boolean operators with Hibernate Search

I'm learning the Hibernate Search Query DSL, and I'm not sure how to construct queries using boolean arguments such as AND or OR.
For example, let's say that I want to return all person records that have a firstName value of "bill" or "bob".
Following the hibernate docs, one example uses the bool() method w/ two subqueries, such as:
QueryBuilder b = fts.getSearchFactory().buildQueryBuilder().forEntity(Person.class).get();
Query luceneQuery = b.bool()
.should(b.keyword().onField("firstName").matching("bill").createQuery())
.should(b.keyword().onField("firstName").matching("bob").createQuery())
.createQuery();
logger.debug("query 1:{}", luceneQuery.toString());
This ultimately produces the lucene query that I want, but is this the proper way to use boolean logic with hibernate search? Is "should()" the equivalent of "OR" (similarly, does "must()" correspond to "AND")?.
Also, writing a query this way feels cumbersome. For example, what if I had a collection of firstNames to match against? Is this type of query a good match for the DSL in the first place?

Yes your example is correct. The boolean operators are called should instead of OR because of the names they have in the Lucene API and documentation, and because it is more appropriate: it is not only influencing a boolean decision, but it also affects scoring of the result.
For example if you search for cars "of brand Fiat" OR "blue", the cars branded Fiat AND blue will also be returned and having an higher score than those which are blue but not Fiat.
It might feel cumbersome because it's programmatic and provides many detailed options. A simpler alternative is to use a simple string for your query and use the QueryParser to create the query. Generally the parser is useful to parse user input, the programmatic one is easier to deal with well defined fields; for example if you have the collection you mentioned it's easy to build it in a for loop.

You can also use BooleanQuery. I would prefer this beacuse You can use this in loop of a list.
org.hibernate.search.FullTextQuery hibque = null;
org.apache.lucene.search.BooleanQuery bquery = new BooleanQuery();
QueryBuilder qb = fulltextsession.getSearchFactory().buildQueryBuilder()
.forEntity(entity.getClass()).get();
for (String keyword : list) {
bquery.add(qb.keyword().wildcard().onField(entityColumn).matching(keyword)
.createQuery() , BooleanClause.Occur.SHOULD);
}
if (!filterColumn.equals("") && !filterValue.equals("")) {
bquery.add(qb.keyword().wildcard().onField(column).matching(value).createQuery()
, BooleanClause.Occur.MUST);
}
hibque = fulltextsession.createFullTextQuery(bquery, entity.getClass());
int num = hibque.getResultSize();

To answer you secondary question:
For example, what if I had a collection of firstNames to match against?
I'm not an expert, but according to (the third example from the end of) 5.1.2.1. Keyword queries in Hibernate Search Documentation, you should be able to build the query like so:
Collection<String> namesCollection = getNames(); // Contains "billy" and "bob", for example
StringBuilder names = new StringBuilder(100);
for(String name : namesCollection) {
names.append(name).append(" "); // Never mind the space at the end of the resulting string.
}
QueryBuilder b = fts.getSearchFactory().buildQueryBuilder().forEntity(Person.class).get();
Query luceneQuery = b.bool()
.should(
// Searches for multiple possible values in the same field
b.keyword().onField("firstName").matching( sb.toString() ).createQuery()
)
.must(b.keyword().onField("lastName").matching("thornton").createQuery())
.createQuery();
and, have as a result, Persons with (firstName preferably "billy" or "bob") AND (lastName = "thornton"), although I don't think it will give the good ol' Billy Bob Thornton a higher score ;-).

I was looking for the same issue and have a somewhat different issue than presented. I was looking for an actual OR junction. The should case didn't work for me, as results that didn't pass any of the two expressions, but with a lower score. I wanted to completely omit these results. You can however create an actual boolean OR expression, using a separate boolean expression for which you disable scoring:
val booleanQuery = cb.bool();
val packSizeSubQuery = cb.bool();
packSizes.stream().map(packSize -> cb.phrase()
.onField(LUCENE_FIELD_PACK_SIZES)
.sentence(packSize.name())
.createQuery())
.forEach(packSizeSubQuery::should);
booleanQuery.must(packSizeSubQuery.createQuery()).disableScoring();
fullTextEntityManager.createFullTextQuery(booleanQuery.createQuery(), Product.class)
return persistenceQuery.getResultList();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.