How to search fields with wildcard and spaces in Hibernate Search - java

I have a search box that performs a search on title field based on the given input, so the user has recommended all available titles starting with the text inserted.It is based on Lucene and Hibernate Search. It works fine until space is entered. Then the result disapear. For example, I want "Learning H" to give me "Learning Hibernate" as the result. However, this doesn't happen. could you please advice me what should I use here instead.
Query Builder:
QueryBuilder qBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(LearningGoal.class).get();
Query query = qBuilder.keyword().wildcard().onField("title")
.matching(searchString + "*").createQuery();
BooleanQuery bQuery = new BooleanQuery();
bQuery.add(query, BooleanClause.Occur.MUST);
for (LearningGoal exGoal : existingGoals) {
Term omittedTerm = new Term("id", String.valueOf(exGoal.getId()));
bQuery.add(new TermQuery(omittedTerm), BooleanClause.Occur.MUST_NOT);
}
#SuppressWarnings("unused")
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(
query, LearningGoal.class);
Hibernate class:
#AnalyzerDef(name = "searchtokenanalyzer",tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = StandardFilterFactory.class),
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = StopFilterFactory.class,params = {
#Parameter(name = "ignoreCase", value = "true") }) })
#Analyzer(definition = "searchtokenanalyzer")
public class LearningGoal extends Node {

I found workaround for this problem. The idea is to tokenize input string and remove stop words. For the last token I created a query using keyword wildcard, and for the all previous words I created a TermQuery. Here is the full code
BooleanQuery bQuery = new BooleanQuery();
Session session = persistence.currentManager();
FullTextSession fullTextSession = Search.getFullTextSession(session);
Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("searchtokenanalyzer");
QueryParser parser = new QueryParser(Version.LUCENE_35, "title", analyzer);
String[] tokenized=null;
try {
Query query= parser.parse(searchString);
String cleanedText=query.toString("title");
tokenized = cleanedText.split("\\s");
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
QueryBuilder qBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(LearningGoal.class).get();
for(int i=0;i<tokenized.length;i++){
if(i==(tokenized.length-1)){
Query query = qBuilder.keyword().wildcard().onField("title")
.matching(tokenized[i] + "*").createQuery();
bQuery.add(query, BooleanClause.Occur.MUST);
}else{
Term exactTerm = new Term("title", tokenized[i]);
bQuery.add(new TermQuery(exactTerm), BooleanClause.Occur.MUST);
}
}
for (LearningGoal exGoal : existingGoals) {
Term omittedTerm = new Term("id", String.valueOf(exGoal.getId()));
bQuery.add(new TermQuery(omittedTerm), BooleanClause.Occur.MUST_NOT);
}
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(
bQuery, LearningGoal.class);

SQL uses different wildcards than any terminal. In SQL '%' replaces zero or more occurrences of any character (in the terminal you use '*' instead), and the underscore '_' replaces exactly one character (in the terminal you use '?' instead). Hibernate doesn't translate the wildcard characters.
So in the second line you have to replace matching(searchString + "*") with
matching(searchString + "%")

Related

How to do a like query with a variable set of strings?

I am triying to do a "like query" with a variable set of strings, in order to retrieve in a single query all texts that contains a set of words, that is:
public long countByTextLike(Set<String> strings) {
CriteriaBuilder builder = manager.getCriteriaBuilder();
CriteriaQuery<Long> query = builder.createQuery(Long.class);
Root<Example> root = query.from(Example.class);
query.select(builder.count(root.get("id"))).where(
builder.and(
builder.equal(root.get("lang"), "EN")
)
);
//this does not work
for (String word : strings) {
query.where(builder.or(builder.like(root.get("text"), word)));
}
return manager.createQuery(query).getSingleResult();
}
unfortunately this does not work because the where is overwritten in each loop. Only the last word of loop is used and "AND" restictions are being overwriten.
How is possible to do a "like query" with a variable number of strings? It is not posible?
I am using the spring framework but i think that the question could be extendable to hibernate
You can use predicates, and then add them all with only one where clause
public long countByTextLike(Set<String> strings) {
CriteriaBuilder builder = currentSession().getCriteriaBuilder();
CriteriaQuery<Long> query = builder.createQuery(Long.class);
Root<Example> root = query.from(Example.class);
Predicate[] predicates = new Predicate[strings.size()];
query.select(builder.count(root.get("id")));
Predicate langPredicate = builder.equal(root.get("lang"), "EN");
int cont = 0;
for (String word : strings) {
Predicate pred = builder.like(root.get("text"), "%" + word + "%");
predicates[cont++] = pred;
}
Predicate orPredicate = builder.or(predicates);
Predicate finalPredicate = builder.and(orPredicate, langPredicate);
return manager.createQuery(query).where(finalPredicate).getSingleResult();
}

Trying to get more matches with lucene

I'm using Java and lucene to match each song of a list I receive from a service, with local files. What I'm currently struggling with, is finding a query that will get me the greatest amount of matches per song possible. If I could get at least one matching file per song, it would be great.
This is what I have atm:
public List<String> getMatchesForSong(String artist, String title, String album) throws ParseException, IOException {
StandardAnalyzer analyzer = new StandardAnalyzer();
String defaultQuery = "(title: \"%s\"~2) AND ((artist: \"%s\") OR (album: \"%s\"))";
String searchQuery = String.format(defaultQuery, title, artist, album);
Query query = new QueryParser("title", analyzer).parse(searchQuery);
if (indexWriter == null) {
indexWriter = createIndexWriter(indexDir);
indexSearcher = createIndexSearcher(indexWriter);
}
TopDocs topDocs = indexSearcher.search(query, 20);
if (topDocs.totalHits > 0) {
return parseScoreDocsList(topDocs.scoreDocs);
}
return null;
}
This works very well when there are no inconsistencies, even for non-English characters. But it will not return me a single match, for example, if I receive a song with the title "The Sun Was In My Eyes: Part One", but my corresponding file has the title "The Sun Was In My Eyes: Part 1", or if I receive it like "Pt. 1".
I don't get matches either, when the titles have more words than the corresponding files, like "The End of all Times (Martyrs Fire)" opposed to "The End of all Times". Could happen for albums names too.
So, what I'd like to know is what improvements should I make in my code, in order to get more matches.
So I eventually found out that using a PhraseQuery for the title or album, isn't the best approach, since that would cause lucene to search for an exact mach of such phrase.
What I ended up doing was making a TermQuery for each of the words, of both the title and album, and join everything in a BooleanQuery.
private Query parseQueryForSong(String artist, String title, String album) throws ParseException {
String[] artistArr = artist.split(" ");
String[] titleArr = sanitizePhrase(title).split(" ");
String[] albumArr = sanitizePhrase(album).split(" ");
BooleanQuery.Builder mainQueryBuilder = new BooleanQuery.Builder();
BooleanQuery.Builder albumQueryBuilder = new BooleanQuery.Builder();
PhraseQuery artistQuery = new PhraseQuery("artist", artistArr);
for (String titleWord : titleArr) {
if (!titleWord.isEmpty()) {
mainQueryBuilder.add(new TermQuery(new Term("title", titleWord)), BooleanClause.Occur.SHOULD);
}
}
for (String albumWord : albumArr) {
if (!albumWord.isEmpty()) {
albumQueryBuilder.add(new TermQuery(new Term("album", albumWord)), BooleanClause.Occur.SHOULD);
}
}
mainQueryBuilder.add(artistQuery, BooleanClause.Occur.MUST);
mainQueryBuilder.add(albumQueryBuilder.build(), BooleanClause.Occur.MUST);
StandardAnalyzer analyzer = new StandardAnalyzer();
Query mainQuery = new QueryParser("title", analyzer).parse(mainQueryBuilder.build().toString());
return mainQuery;
}

Hibernate search highlighting not analyzed fields

I'd like to highlight the whole not analyzed fields if they match the search query.
The indexed entity looks as follows:
#Entity
#Indexed
#AnalyzerDef(
name = "documentAnalyzer",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(
factory = StopFilterFactory.class,
params = {
#Parameter(name = "words", value = "stoplist.properties"),
#Parameter(name = "ignoreCase", value = "true")
}
)
}
)
public class Document {
...
#Field(analyze = Analyze.NO)
private String notAnalyzedField; // has "x-xxx-xxx" format
#Field(analyze = Analyze.YES)
private String analyzedField;
}
Suppose I have a Document with notAnalyzedField: "a-bbb-ccc", then I run a search query with the same value and highlight search results using the following code:
String highlightText(Query query, Analyzer analyzer, String fieldName, String text) {
QueryScorer queryScorer = new QueryScorer(query);
SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span>", "</span>");
Highlighter highlighter = new Highlighter(formatter, queryScorer);
return highlighter.getBestFragment(analyzer, fieldName, text);
}
As a result I get the following snippet:"a-<span>bbb</span>-<span>ccc</span>".
And it seems reasonable because the analyzer treats a symbol as a stop word and - as a delimiter and doesn't highlight them. But I cannot figure out how I can avoid using analyzer while highlighting this field. There are a few methods in Highlighter class that require TokenStream instead of Analyzer but I'm not sure how to use them.
A result I want to achieve is the whole highlighted field: "<span>a-bbb-ccc</span>"
Is there a way to achieve this with hibernate-search?
Where does your analyzer come from?
You might want to get it from Hibernate Search:
FullTextEntityManager em = /*...*/;
Analyzer analyzer = em.getSearchFactory()
.getAnalyzer(Document.class);
highlightText(query, analyzer, fieldName, text);
If it doesn't work, try using a KeywordAnalyzer: highlightText(query, new KeywordAnalyzer(), fieldName, text);

Prefix search using lucene

I am trying to do autocomplete using lucene search functionality. I have the following code which searches by the query prefix but along with that it also gives me all the sentences containing that word while I want it to display only sentence or word starting exactly with that prefix.
ex: m
--holiday mansion houseboat
--eye muscles
--movies of all time
--machine
I want it to show only last 2 queries. How to do it am stucked here also I am new to lucene. Please can any one help me in this. Thanks in advance.
addDoc(IndexWriter w, String title, String isbn) throws IOException {
Document doc = new Document();
doc.add(new Field("title", title, Field.Store.YES, Field.Index.ANALYZED));
// use a string field for isbn because we don't want it tokenized
doc.add(new Field("isbn", isbn, Field.Store.YES, Field.Index.ANALYZED));
w.addDocument(doc);
}
Main:
try {
// 0. Specify the analyzer for tokenizing text.
// The same analyzer should be used for indexing and searching
StandardAnalyzer analyzer = new StandardAnalyzer();
// 1. create the index
Directory index = FSDirectory.open(new File(indexDir));
IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED); //3
for (int i = 0; i < source.size(); i++) {
addDoc(writer, source.get(i), + (i + 1) + "z");
}
writer.close();
// 2. query
Term term = new Term("title", querystr);
//create the term query object
PrefixQuery query = new PrefixQuery(term);
// 3. search
int hitsPerPage = 20;
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// 4. Get results
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println(d.get("title"));
}
reader.close();
} catch (Exception e) {
System.out.println("Exception (LuceneAlgo.getSimilarString()) : " + e);
}
}
}
I see two solutions:
as suggested by Yahnoosh, save the title field twice, Once as TextField (=analyzed) and once as StringField (not analyzed)
save it just as TextField, but When Querying use SpanFirstQuery
// 2. query
Term term = new Term("title", querystr);
//create the term query object
PrefixQuery pq = new PrefixQuery(term);
SpanQuery wrapper = new SpanMultiTermQueryWrapper<PrefixQuery>(pq);
Query final = new SpanFirstQuery(wrapper, 1);
If I understand your scenario correctly, you want to autocomplete on the title field.
The solution is to have two fields: one analyzed, to enable querying over it, one non-analyzed to have titles indexed without breaking them into individual terms.
Your autocomplete logic should issue prefix queries against the non-analyzed field to match only on the first word. Your term queries should be issued against the analyzed field for matches within the title.
I hope that makes sense.

search with in date range lucene and AND operator

I want to make a query which will give me data between date range and also by one more AND condition in lucene 3.0.1. This is the code for query between two dates :
IndexSearcher searcher = new IndexSearcher(directory);
String lowerDate = "2013-06-27";
String upperDate = "2013-06-29";
boolean includeLower = true;
boolean includeUpper = true;
TermRangeQuery query = new TermRangeQuery("created_at",lowerDate, upperDate, includeLower, includeUpper);
// display search results
TopDocs topDocs = searcher.search(query, 10);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
System.out.println(doc.get("id"));
}
I have one more indexed column text, how can I include one more AND condition with this query, I am trying to get results within date range which also contain some keyword in test column.
You need to use a BooleanQuery, like:
TermRangeQuery dateQuery = new TermRangeQuery("created_at",lowerDate, upperDate, includeLower, includeUpper);
TermQuery keywordQuery = new TermQuery(new Term("keyword", "term"));
BooleanQuery bq = new BooleanQuery();
bq.add(new BooleanClause(dateQuery, BooleanClause.Occur.MUST))
bq.add(new BooleanClause(keywordQuery, BooleanClause.Occur.MUST))
// display search results
TopDocs topDocs = searcher.search(bq, 10);
Combining the two clauses, each with BooleanClause.Occur.MUST, is equivalent to an "AND" (take a look at the descriptions of the "MUST", "SHOULD" and "MUST_NOT" in the BooleanClause.Occur documentation to better understand your options with Lucene's "boolean" logic).

Categories