search with in date range lucene and AND operator - java

I want to make a query which will give me data between date range and also by one more AND condition in lucene 3.0.1. This is the code for query between two dates :
IndexSearcher searcher = new IndexSearcher(directory);
String lowerDate = "2013-06-27";
String upperDate = "2013-06-29";
boolean includeLower = true;
boolean includeUpper = true;
TermRangeQuery query = new TermRangeQuery("created_at",lowerDate, upperDate, includeLower, includeUpper);
// display search results
TopDocs topDocs = searcher.search(query, 10);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
System.out.println(doc.get("id"));
}
I have one more indexed column text, how can I include one more AND condition with this query, I am trying to get results within date range which also contain some keyword in test column.

You need to use a BooleanQuery, like:
TermRangeQuery dateQuery = new TermRangeQuery("created_at",lowerDate, upperDate, includeLower, includeUpper);
TermQuery keywordQuery = new TermQuery(new Term("keyword", "term"));
BooleanQuery bq = new BooleanQuery();
bq.add(new BooleanClause(dateQuery, BooleanClause.Occur.MUST))
bq.add(new BooleanClause(keywordQuery, BooleanClause.Occur.MUST))
// display search results
TopDocs topDocs = searcher.search(bq, 10);
Combining the two clauses, each with BooleanClause.Occur.MUST, is equivalent to an "AND" (take a look at the descriptions of the "MUST", "SHOULD" and "MUST_NOT" in the BooleanClause.Occur documentation to better understand your options with Lucene's "boolean" logic).

Related

Trying to get more matches with lucene

I'm using Java and lucene to match each song of a list I receive from a service, with local files. What I'm currently struggling with, is finding a query that will get me the greatest amount of matches per song possible. If I could get at least one matching file per song, it would be great.
This is what I have atm:
public List<String> getMatchesForSong(String artist, String title, String album) throws ParseException, IOException {
StandardAnalyzer analyzer = new StandardAnalyzer();
String defaultQuery = "(title: \"%s\"~2) AND ((artist: \"%s\") OR (album: \"%s\"))";
String searchQuery = String.format(defaultQuery, title, artist, album);
Query query = new QueryParser("title", analyzer).parse(searchQuery);
if (indexWriter == null) {
indexWriter = createIndexWriter(indexDir);
indexSearcher = createIndexSearcher(indexWriter);
}
TopDocs topDocs = indexSearcher.search(query, 20);
if (topDocs.totalHits > 0) {
return parseScoreDocsList(topDocs.scoreDocs);
}
return null;
}
This works very well when there are no inconsistencies, even for non-English characters. But it will not return me a single match, for example, if I receive a song with the title "The Sun Was In My Eyes: Part One", but my corresponding file has the title "The Sun Was In My Eyes: Part 1", or if I receive it like "Pt. 1".
I don't get matches either, when the titles have more words than the corresponding files, like "The End of all Times (Martyrs Fire)" opposed to "The End of all Times". Could happen for albums names too.
So, what I'd like to know is what improvements should I make in my code, in order to get more matches.
So I eventually found out that using a PhraseQuery for the title or album, isn't the best approach, since that would cause lucene to search for an exact mach of such phrase.
What I ended up doing was making a TermQuery for each of the words, of both the title and album, and join everything in a BooleanQuery.
private Query parseQueryForSong(String artist, String title, String album) throws ParseException {
String[] artistArr = artist.split(" ");
String[] titleArr = sanitizePhrase(title).split(" ");
String[] albumArr = sanitizePhrase(album).split(" ");
BooleanQuery.Builder mainQueryBuilder = new BooleanQuery.Builder();
BooleanQuery.Builder albumQueryBuilder = new BooleanQuery.Builder();
PhraseQuery artistQuery = new PhraseQuery("artist", artistArr);
for (String titleWord : titleArr) {
if (!titleWord.isEmpty()) {
mainQueryBuilder.add(new TermQuery(new Term("title", titleWord)), BooleanClause.Occur.SHOULD);
}
}
for (String albumWord : albumArr) {
if (!albumWord.isEmpty()) {
albumQueryBuilder.add(new TermQuery(new Term("album", albumWord)), BooleanClause.Occur.SHOULD);
}
}
mainQueryBuilder.add(artistQuery, BooleanClause.Occur.MUST);
mainQueryBuilder.add(albumQueryBuilder.build(), BooleanClause.Occur.MUST);
StandardAnalyzer analyzer = new StandardAnalyzer();
Query mainQuery = new QueryParser("title", analyzer).parse(mainQueryBuilder.build().toString());
return mainQuery;
}

Prefix search using lucene

I am trying to do autocomplete using lucene search functionality. I have the following code which searches by the query prefix but along with that it also gives me all the sentences containing that word while I want it to display only sentence or word starting exactly with that prefix.
ex: m
--holiday mansion houseboat
--eye muscles
--movies of all time
--machine
I want it to show only last 2 queries. How to do it am stucked here also I am new to lucene. Please can any one help me in this. Thanks in advance.
addDoc(IndexWriter w, String title, String isbn) throws IOException {
Document doc = new Document();
doc.add(new Field("title", title, Field.Store.YES, Field.Index.ANALYZED));
// use a string field for isbn because we don't want it tokenized
doc.add(new Field("isbn", isbn, Field.Store.YES, Field.Index.ANALYZED));
w.addDocument(doc);
}
Main:
try {
// 0. Specify the analyzer for tokenizing text.
// The same analyzer should be used for indexing and searching
StandardAnalyzer analyzer = new StandardAnalyzer();
// 1. create the index
Directory index = FSDirectory.open(new File(indexDir));
IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED); //3
for (int i = 0; i < source.size(); i++) {
addDoc(writer, source.get(i), + (i + 1) + "z");
}
writer.close();
// 2. query
Term term = new Term("title", querystr);
//create the term query object
PrefixQuery query = new PrefixQuery(term);
// 3. search
int hitsPerPage = 20;
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// 4. Get results
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println(d.get("title"));
}
reader.close();
} catch (Exception e) {
System.out.println("Exception (LuceneAlgo.getSimilarString()) : " + e);
}
}
}
I see two solutions:
as suggested by Yahnoosh, save the title field twice, Once as TextField (=analyzed) and once as StringField (not analyzed)
save it just as TextField, but When Querying use SpanFirstQuery
// 2. query
Term term = new Term("title", querystr);
//create the term query object
PrefixQuery pq = new PrefixQuery(term);
SpanQuery wrapper = new SpanMultiTermQueryWrapper<PrefixQuery>(pq);
Query final = new SpanFirstQuery(wrapper, 1);
If I understand your scenario correctly, you want to autocomplete on the title field.
The solution is to have two fields: one analyzed, to enable querying over it, one non-analyzed to have titles indexed without breaking them into individual terms.
Your autocomplete logic should issue prefix queries against the non-analyzed field to match only on the first word. Your term queries should be issued against the analyzed field for matches within the title.
I hope that makes sense.

how to refine the search using apache lucene index

I am searching a keyword using index created by apache lucene , it returns the name of files which contains the given keyword now i want to refine the search again only in the files returned by lucene search . How is it possible to refine the search using apache lucene.
I am using the following code.
try
{
File indexDir=new File("path upto the index directory");
Directory directory = FSDirectory.open(indexDir);
IndexSearcher searcher = new IndexSearcher(directory, true);
QueryParser parser = new QueryParser(Version.LUCENE_36, "contents", new SimpleAnalyzer(Version.LUCENE_36));
Query query = parser.parse(qu);
query.setBoost((float) 1.5);
TopDocs topDocs = searcher.search(query, maxhits);
ScoreDoc[] hits = topDocs.scoreDocs;
len = hits.length;
int docId = 0;
Document d;
for ( i = 0; i<len; i++) {
docId = hits[i].doc;
d = searcher.doc(docId);
filename= d.get(("filename"));
}
}
catch(Exception ex){ex.printStackTrace();}
I have added documents in the lucene index using as contents and filename.
You want to use a BooleanQuery for something like this. That will let you AND the original search constraints with the refined search constraints.
Example:
BooleanQuery query = new BooleanQuery();
Query origSearch = getOrigSearch(searchString);
Query refinement = makeRefinement();
query.add(origSearch, Occur.MUST);
query.add(refinement, Occur.MUST);
TopDocs topDocs = searcher.search(query, maxHits);

How to search fields with wildcard and spaces in Hibernate Search

I have a search box that performs a search on title field based on the given input, so the user has recommended all available titles starting with the text inserted.It is based on Lucene and Hibernate Search. It works fine until space is entered. Then the result disapear. For example, I want "Learning H" to give me "Learning Hibernate" as the result. However, this doesn't happen. could you please advice me what should I use here instead.
Query Builder:
QueryBuilder qBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(LearningGoal.class).get();
Query query = qBuilder.keyword().wildcard().onField("title")
.matching(searchString + "*").createQuery();
BooleanQuery bQuery = new BooleanQuery();
bQuery.add(query, BooleanClause.Occur.MUST);
for (LearningGoal exGoal : existingGoals) {
Term omittedTerm = new Term("id", String.valueOf(exGoal.getId()));
bQuery.add(new TermQuery(omittedTerm), BooleanClause.Occur.MUST_NOT);
}
#SuppressWarnings("unused")
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(
query, LearningGoal.class);
Hibernate class:
#AnalyzerDef(name = "searchtokenanalyzer",tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = StandardFilterFactory.class),
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = StopFilterFactory.class,params = {
#Parameter(name = "ignoreCase", value = "true") }) })
#Analyzer(definition = "searchtokenanalyzer")
public class LearningGoal extends Node {
I found workaround for this problem. The idea is to tokenize input string and remove stop words. For the last token I created a query using keyword wildcard, and for the all previous words I created a TermQuery. Here is the full code
BooleanQuery bQuery = new BooleanQuery();
Session session = persistence.currentManager();
FullTextSession fullTextSession = Search.getFullTextSession(session);
Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("searchtokenanalyzer");
QueryParser parser = new QueryParser(Version.LUCENE_35, "title", analyzer);
String[] tokenized=null;
try {
Query query= parser.parse(searchString);
String cleanedText=query.toString("title");
tokenized = cleanedText.split("\\s");
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
QueryBuilder qBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(LearningGoal.class).get();
for(int i=0;i<tokenized.length;i++){
if(i==(tokenized.length-1)){
Query query = qBuilder.keyword().wildcard().onField("title")
.matching(tokenized[i] + "*").createQuery();
bQuery.add(query, BooleanClause.Occur.MUST);
}else{
Term exactTerm = new Term("title", tokenized[i]);
bQuery.add(new TermQuery(exactTerm), BooleanClause.Occur.MUST);
}
}
for (LearningGoal exGoal : existingGoals) {
Term omittedTerm = new Term("id", String.valueOf(exGoal.getId()));
bQuery.add(new TermQuery(omittedTerm), BooleanClause.Occur.MUST_NOT);
}
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(
bQuery, LearningGoal.class);
SQL uses different wildcards than any terminal. In SQL '%' replaces zero or more occurrences of any character (in the terminal you use '*' instead), and the underscore '_' replaces exactly one character (in the terminal you use '?' instead). Hibernate doesn't translate the wildcard characters.
So in the second line you have to replace matching(searchString + "*") with
matching(searchString + "%")

Indexing and Searching Date in Lucene

I tried it to index date with DateTools.dateToString() method. Its working properly for indexing as well as searching.
But my already indexed data which has some references is in such a way that it has indexed Date as a new Date().getTime().
So my problem is how to perform RangeSearch Query on this data...
Any solution to this???
Thanks in Advance.
You need to use a TermRangeQuery on your date field. That field always needs to be indexed with DateTools.dateToString() for it to work properly. Here's a full example of indexing and searching on a date range with Lucene 3.0:
public class LuceneDateRange {
public static void main(String[] args) throws Exception {
// setup Lucene to use an in-memory index
Directory directory = new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
MaxFieldLength mlf = MaxFieldLength.UNLIMITED;
IndexWriter writer = new IndexWriter(directory, analyzer, true, mlf);
// use the current time as the base of dates for this example
long baseTime = System.currentTimeMillis();
// index 10 documents with 1 second between dates
for (int i = 0; i < 10; i++) {
Document doc = new Document();
String id = String.valueOf(i);
String date = buildDate(baseTime + i * 1000);
doc.add(new Field("id", id, Store.YES, Index.NOT_ANALYZED));
doc.add(new Field("date", date, Store.YES, Index.NOT_ANALYZED));
writer.addDocument(doc);
}
writer.close();
// search for documents from 5 to 8 seconds after base, inclusive
IndexSearcher searcher = new IndexSearcher(directory);
String lowerDate = buildDate(baseTime + 5000);
String upperDate = buildDate(baseTime + 8000);
boolean includeLower = true;
boolean includeUpper = true;
TermRangeQuery query = new TermRangeQuery("date",
lowerDate, upperDate, includeLower, includeUpper);
// display search results
TopDocs topDocs = searcher.search(query, 10);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
System.out.println(doc);
}
}
public static String buildDate(long time) {
return DateTools.dateToString(new Date(time), Resolution.SECOND);
}
}
You'll get much better search performance if you use a NumericField for your date, and then NumericRangeFilter/Query to do the range search.
You just have to encode your date as a long or int. One simple way is to call the .getTime() method of your Date, but this may be far more resolution (milli-seconds) than you need. If you only need down to the day, you can encode it as YYYYMMDD integer.
Then, at search time, do the same conversion on your start/end Dates and run NumericRangeQuery/Filter.

Categories