how to search case insensitive in hibernate search using lucene query? - java

I am using two analyzers while indexing such as StandardAnalyzer for some fields and WhitespaceAnalyzer for some fields holding value as special character like c++ but I am writing query as
QueryBuilder qb = fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(Professional.class).get();
BooleanQuery booleanQuery = new BooleanQuery();
query = qb .keyword().wildcard().onField(fieldName).ignoreFieldBridge().matching(fieldValue+"*").createQuery();
booleanQuery.add(query, BooleanClause.Occur.MUST);
the above query returns results are different as case sensitive like c++ And C++
so i want acheive case insensitve for results,because of i am not using same analyzer while indexing as well as searching ,so am i wrong
plz help me because i get strucked from 1 week plz...
thanks in advance

I had the same issue. I was using keyword().wildcard() for one field and faced the issue that not in lowercase written word could not be found.
The solution was very simple - instead of implementing Analyzer I converted search term to lower case before writing any query. In your case it would look like this:
fieldValue = fieldValue.toLowerCase();
QueryBuilder qb = fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(Professional.class).get();
BooleanQuery booleanQuery = new BooleanQuery();
query = qb.keyword().wildcard().onField(fieldName).ignoreFieldBridge().matching(fieldValue+"*").createQuery();
booleanQuery.add(query, BooleanClause.Occur.MUST);

You should use a custom analyzer and add LowerCaseFilter after the WhitespaceTokenizer. Like this:
Analyzer analyzer = new Analyzer() {
#Override
protected TokenStreamComponents createComponents(final String fieldName) {
Tokenizer source = new WhitespaceTokenizer();
TokenStream filter = new LowerCaseAnalyzer(source);
return new TokenStreamComponents(source, filter);
}
}

As of Hibernate 5.10.3, the syntax has slightly changed for creating custom lucene analyzer:
public class CustomAuthorAnalyzer extends Analyzer {
#Override
protected TokenStreamComponents createComponents(final String fieldName) {
Tokenizer source = new WhitespaceTokenizer();
TokenStream filter = new LowerCaseFilter(source);
return new TokenStreamComponents(source, filter);
}
}
Then in order to use this analyzer on a custom field, we just need to specify it through #Analyzer annotation:
#Analyzer(impl = CustomAuthorAnalyzer.class)
#Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
private String author;
Hope that helps.
Alternatively, lucene also provides a mechanism to easily compose a new custom analyzer:
public static Analyzer create() throws IOException {
Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
.withTokenizer(WhitespaceTokenizerFactory.class)
.addTokenFilter(LowerCaseFilterFactory.class)
.addTokenFilter(StopFilterFactory.class, "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
.build();
return ana;
}
The builder class is pretty easy to use for creating a composite analyzers.
More information here: https://lucene.apache.org/core/6_4_2/analyzers-common/org/apache/lucene/analysis/custom/CustomAnalyzer.html

Related

How can I translate a TupleExpr or a ParsedTupleQuery into the Query String?

I want to parse a query using rdf4j's SPARQLParser, modify the underlying query tree (=TupleExpr) and translate it back into a query string. Is there a way to do that with rdf4j?
I tried the following but it didn't work
SPARQLParser parser = new SPARQLParser();
ParsedQuery originalQuery = parser.parseQuery(query, null);
if (originalQuery instanceof ParsedTupleQuery) {
TupleExpr queryTree = originalQuery.getTupleExpr();
queryTree.visit(myQueryModelVisitor());
originalQuery.setTupleExpr(queryTree);
System.out.println(queryTree);
ParsedQuery tsQuery = new ParsedTupleQuery(queryTree);
System.out.println(tsQuery.getSourceString());
}
the printed output is null.
You'll want to use the org.eclipse.rdf4j.queryrender.sparql.experimental.SparqlQueryRenderer which is specifically designed to transform a TupleExpr back into a SPARQL query string.
Roughly, like this:
SPARQLParser parser = new SPARQLParser();
ParsedQuery originalQuery = parser.parseQuery(query, null);
if (originalQuery instanceof ParsedTupleQuery) {
TupleExpr queryTree = originalQuery.getTupleExpr();
queryTree.visit(myQueryModelVisitor());
originalQuery.setTupleExpr(queryTree);
System.out.println(queryTree);
ParsedQuery tsQuery = new ParsedTupleQuery(queryTree);
String transformedQuery = new SparqlQueryRenderer().render(tsQuery);
}
Note that this component is still experimental, and does not have guaranteed complete coverage of all SPARQL 1.1 features.
As an aside, the reason getSourceString() does not work here is that method is designed to return the input source string from which the parsed query was generated. Since in your case you've just created a new ParsedQuery object from scratch, there is no source string.

How to filter a search query on multiple fields?

I have a list of studies. And I want to search over them with a typeahead function in a front page.
For that I use Hibernate Search with Spring Boot (2.1.5).
I indexed my STUDY table and some fields are marked with #Field attribute to be indexed.
The search works well.
But now I want to add a filter to use the same typeahead function but searching of a subset of my studies.
For that I created a filter like the Hibernate Search documentation but I didn't found a way to filter on two field with a OR between them.
My actual filter but filtering only on one field (avisDefinitifCet):
/**
* etudeFilter
*/
public class EtudeFilterFactory {
private String clasCet1ErPassage;
private String avisDefinitifCet;
public void setClasCet1ErPassage(String clasCet1ErPassage) {
this.clasCet1ErPassage = clasCet1ErPassage;
}
public void setAvisDefinitifCet(String avisDefinitifCet) {
this.avisDefinitifCet = avisDefinitifCet;
}
#Factory
public Query getFilter() {
System.out.println("Filter avisDefinitifCet : " + this.avisDefinitifCet.toLowerCase());
return new TermQuery(new Term("avisDefinitifCet", this.avisDefinitifCet.toLowerCase()));
}
}
How can I filter with a second field in my case clasCet1ErPassage?
At this end make a search on the standard search query and applying the filter like this
SELECT *
FROM STUDY
WHERE
A=t OR B=t OR C=t -- Normal search
AND (avisDefinitifCet='acceptation' OR clasCet1ErPassage='acceptation') -- Filter on two fields
My search function:
public List<Etude> search(String text, Map<String, String> allParams) {
text = stripAccents(text);
// get the full text entity manager
FullTextEntityManager fullTextEntityManager = getFullTextEntityManager(entityManager);
// create the query using Hibernate Search query DSL
QueryBuilder queryBuilder = fullTextEntityManager
.getSearchFactory()
.buildQueryBuilder()
.forEntity(Etude.class)
.get();
// Simple Query String queries
Query query = queryBuilder
.simpleQueryString()
.onFields("n0Cet")
.andField("anneeCet")
.andField("noDansAnneeCet")
.andField("sigleEtude")
.andField("titreEtude")
.andField("traitement1")
.andField("traitement2")
.andField("traitement3")
.andField("traitement4")
.andField("traitement5")
.andField("demandeurIgr.nomInvestigateurIgr")
.andField("investigateurHorsIgr.nomInvestigateur")
.andField("investigateurIgr.nomInvestigateurIgr")
.andField("promoteur.nomPromoteur")
.matching(text)
.createQuery();
// wrap Lucene query in an Hibernate Query object
FullTextQuery fullTextQuery = fullTextEntityManager
.createFullTextQuery(query, Etude.class)
.setMaxResults(101);
// Here allParams contains
// avisDefinitifCet => 'acceptation',
// clasCet1ErPassage => 'acceptation'
allParams.forEach((key, value) -> {
fullTextQuery.enableFullTextFilter("etudeFilter").setParameter(key, value);
});
return (List<Etude>) fullTextQuery.getResultList();
}
Am I thinking in the right way to implement it or I'm going wrong?
EDIT: Appparently you're using full-text filters, where you indeed have to use Lucene APIs directly.
In this case, just use a boolean junction with "should" clauses. When there's only "should" clauses in a boolean junction, at least one of them has to match.
#Factory
public Query getFilter() {
String valueToMatch = this.avisDefinitifCet.toLowerCase();
return new BooleanQuery.Builder()
.add(new TermQuery(new Term("avisDefinitifCet", valueToMatch)), Occur.SHOULD)
.add(new TermQuery(new Term("clasCet1ErPassage", valueToMatch)), Occur.SHOULD)
.build();
}
Previous answer:
If you're new to Lucene, you really should give the Hibernate Search DSL a try.
In your case, you'll want a keyword query that targets two fields:
EntityManager em = ...;
FullTextEntityManager fullTextEntityManager =
org.hibernate.search.jpa.Search.getFullTextEntityManager(em);
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity( Etude.class ).get();
Query luceneQuery = queryBuilder.keyword()
.onField("avisDefinitifCet").andField("clasCet1ErPassage")
.matching("acceptation")
.createQuery();
Note that, despite the method being called andField, it's actually an "OR": the query will match if any field matches.
For more advanced combination of queries, have a look at boolean junctions.
Here what I used to solve my problem thanks to #yrodiere
#Factory
public Query getFilter() {
return new BooleanQuery.Builder()
.add(new TermQuery(new Term("avisDefinitifCet", this.avisDefinitifCet.toLowerCase())), BooleanClause.Occur.SHOULD)
.add(new TermQuery(new Term("clasCet1ErPassage", this.clasCet1ErPassage.toLowerCase())), BooleanClause.Occur.SHOULD)
.build();
}

How to define custom analyzer to do global search with hibernate-search and elasticsearch

I have an implementation of hibernate-search-orm (5.9.0.Final) with hibernate-search-elasticsearch (5.9.0.Final).
I defined a custom analyzer on an entity (see beelow) and I indexed two entities :
id: "1"
title: "Médiatiques : récit et société"
abstract:...
id: "2"
title: "Mediatique Com'7"
abstract:...
The search works fine when I search on title field :
"title:médiatique" => 2 results.
"title:mediatique" => 2 results.
My problem is when I do a global search with accents (or not) :
search on "médiatique => 1 result (id:1)
search on "mediatique => 1 result (id:2)
Is there a way to resolve this?
Thanks.
Entity definition:
#Entity
#Table(name="bibliographic")
#DynamicUpdate
#DynamicInsert
#Indexed(index = "bibliographic")
#FullTextFilterDefs({
#FullTextFilterDef(name = "fieldsElasticsearchFilter",
impl = FieldsElasticsearchFilter.class)
})
#AnalyzerDef(name = "customAnalyzer",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
})
#Analyzer(definition = "customAnalyzer")
public class BibliographicHibernate implements Bibliographic {
...
#Column(name="title", updatable = false)
#Fields( {
#Field,
#Field(name = "titleSort", analyze = Analyze.NO, store = Store.YES)
})
#SortableField(forField = "titleSort")
private String title;
...
}
Search method :
FullTextEntityManager ftem = Search.getFullTextEntityManager(entityManager);
QueryBuilder qb = ftem.getSearchFactory().buildQueryBuilder().forEntity(Bibliographic.class).get();
QueryDescriptor q = ElasticsearchQueries.fromQueryString(queryString);
FullTextQuery query = ftem.createFullTextQuery(q, Bibliographic.class).setFirstResult(start).setMaxResults(rows);
if (filters!=null){
filters.stream().map((filter) -> filter.split(":")).forEach((f) -> {
query.enableFullTextFilter("fieldsElasticsearchFilter")
.setParameter("field", f[0])
.setParameter("value", f[1]);
}
);
}
if (facetFields!=null){
facetFields.stream().map((facet) -> facet.split(":")).forEach((f) ->{
query.getFacetManager()
.enableFaceting(qb.facet()
.name(f[0])
.onField(f[0])
.discrete()
.orderedBy(FacetSortOrder.COUNT_DESC)
.includeZeroCounts(false)
.maxFacetCount(10)
.createFacetingRequest() );
}
);
}
List<Bibliographic> bibs = query.getResultList();
To be honest I'm more surprised document 1 would match at all, since there's a trailing "s" on "Médiatiques" and you don't use any stemmer.
You are in a special case here: you are using a query string and passing it directly to Elasticsearch (that's what ElasticsearchQueries.fromQueryString(queryString) does). Hibernate Search has very little impact on the query being run, it only impacts the indexed content and the Elasticsearch mapping here.
When you run a QueryString query on Elasticsearch and you don't specify any field, it uses all fields in the document. I wouldn't bet that the analyzer used when analyzing your query is the same analyzer that you defined on your "title" field. In particular, it may not be removing accents.
An alternative solution would be to build a simple query string query using the QueryBuilder. The syntax of queries is a bit more limited, but is generally enough for end users. The code would look like this:
FullTextEntityManager ftem = Search.getFullTextEntityManager(entityManager);
QueryBuilder qb = ftem.getSearchFactory().buildQueryBuilder().forEntity(Bibliographic.class).get();
Query q = qb.simpleQueryString()
.onFields("title", "abstract")
.matching(queryString)
.createQuery();
FullTextQuery query = ftem.createFullTextQuery(q, Bibliographic.class).setFirstResult(start).setMaxResults(rows);
Users would still be able to target specific fields, but only in the list you provided (which, by the way, is probably safer, otherwise they could target sort fields and so on, which you probably don't want to allow). By default, all the fields in that list would be targeted.
This may lead to the exact same result as the query string, but the advantage is, you can override the analyzer being used for the query. For instance:
FullTextEntityManager ftem = Search.getFullTextEntityManager(entityManager);
QueryBuilder qb = ftem.getSearchFactory().buildQueryBuilder().forEntity(Bibliographic.class)
.overridesForField("title", "customAnalyzer")
.overridesForField("abstract", "customAnalyzer")
.get();
Query q = qb.simpleQueryString()
.onFields("title", "abstract")
.matching(queryString)
.createQuery();
FullTextQuery query = ftem.createFullTextQuery(q, Bibliographic.class).setFirstResult(start).setMaxResults(rows);
... and this will use your analyzer when querying.
As an alternative, you can also use a more advanced JSON query by replacing ElasticsearchQueries.fromQueryString(queryString) with ElasticsearchQueries.fromJsonQuery(json). You will have to craft the JSON yourself, though, taking some precautions to avoid any injection from the user (use Gson to build the Json), and taking care to follow the Elasticsearch query syntax.
You can find more information about simple query string queries in the official documentation.
Note: you may want to add FrenchMinimalStemFilterFactory to your list of token filters in your custom analyzer. It's not the cause of your problem, but once you manage to use your analyzer in search queries, you will very soon find it useful.

Hibernate search custom stop words list

I need to customize stopwords list for search by Document title.
I have the following mapping:
#Entity
#Indexed
#AnalyzerDef(
name = "documentAnalyzer",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(
factory = StopFilterFactory.class,
params = {
#Parameter(name = "words", value = "stoplist.properties"),
#Parameter(name = "ignoreCase", value = "true")
}
)
}
)
public class Document {
...
#Field(analyzer = #Analyzer(definition = "documentAnalyzer"))
private String title;
...
}
stoplist.properties file is in resources directory and contains stopwords that are different from StandardAnalyzer defaults.
But the search doesn't return any results if I use stopwords that are enabled by default but don't exist in my stoplist.properties file, e.g. the word will.
What is wrong with current configuration?
How can I make hibernate search use custom stopwords list?
I use hibernate-search-orm 5.6.1 version.
Results are validated in an integration test with index created on-the-fly:
#Before
public void setUpLuceneIndex() throws InterruptedException {
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
fullTextEntityManager.createIndexer().startAndWait();
}
Your configuration looks sane as far as I can see.
Did you reindex your entities after having changed the stop words configuration? You need that for the new configuration to be taken into account at index time.
If you did and it still does not work, try to add a breakpoint in StopFilterFactory constructor and inform method to see what's going on!

Lucene : Changing the default facet delimiter?

First post on this wonderful site!
My goal is to use hierarchical facets for searching an index using Lucene. However, my facets need to be delimited by a character other than '/', (in this case, '~'). Example:
Categories
Categories~Category1
Categories~Category2
I have created a class that implements FacetIndexingParams interface (a copy of DefaultFacetIndexingParams with the DEFAULT_FACET_DELIM_CHAR param set to '~').
Paraphrased indexing code : (using FSDirectory for both index and taxonomy)
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34)
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer)
IndexWriter writer = new IndexWriter(indexDir, config)
TaxonomyWriter taxo = new LuceneTaxonomyWriter(taxDir, OpenMode.CREATE)
Document doc = new Document()
// Add bunch of Fields... hidden for the sake of brevity
List<CategoryPath> categories = new ArrayList<CategoryPath>()
row.tags.split('\\|').each{ tag ->
def cp = new CategoryPath()
tag.split('~').each{
cp.add(it)
}
categories.add(cp)
}
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
DocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder(taxo, facetIndexingParams)
categoryDocBuilder.setCategoryPaths(categories).build(doc)
writer.addDocument(doc)
// Commit and close both writer and taxo.
Search code paraphrased:
// Create index and taxonomoy readers to get info from index and taxonomy
IndexReader indexReader = IndexReader.open(indexDir)
TaxonomyReader taxo = new LuceneTaxonomyReader(taxDir)
Searcher searcher = new IndexSearcher(indexReader)
QueryParser parser = new QueryParser(Version.LUCENE_34, "content", new StandardAnalyzer(Version.LUCENE_34))
parser.setAllowLeadingWildcard(true)
Query q = parser.parse(query)
TopScoreDocCollector tdc = TopScoreDocCollector.create(10, true)
List<FacetResult> res = null
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
FacetSearchParams facetSearchParams = new FacetSearchParams(facetIndexingParams)
CountFacetRequest cfr = new CountFacetRequest(new CategoryPath(""), 99)
cfr.setDepth(2)
cfr.setSortBy(SortBy.VALUE)
facetSearchParams.addFacetRequest(cfr)
FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxo)
def cp = new CategoryPath("Category~Category1", (char)'~')
searcher.search(DrillDown.query(q, cp), MultiCollector.wrap(tdc, facetsCollector))
The results always return a list of facets in the form of "Category/Category1".
I have used the Luke tool to look at the index and it appears the facets are being delimited by the '~' character in the index.
What is the best route to do this? Any help is greatly appreciated!
I have figured out the issue. The search and indexing are working as they are supposed to. It is how I have been getting the facet results that is the issue. I was using :
res = facetsCollector.getFacetResults()
res.each{ result ->
result.getFacetResultNode().getLabel().toString()
}
What I needed to use was :
res = facetsCollector.getFacetResults()
res.each{ result ->
result.getFacetResultNode().getLabel().toString((char)'~')
}
The difference being the paramter sent to the toString function!
Easy to overlook, tough to find.
Hope this helps others.

Categories