Apache Lucene Sort Issues with GAE-Lucene addDocuments - java

I have been trying to get Sort working for Apache Lucene and Google App Engine. I am using the https://github.com/UltimaPhoenix/luceneappengine to integrate Luncene in GAE. Here is what I am doing
I have a list of Documents, which I am putting into Lucene using the IndexWriter using addDocuments() method.
for(Object object : objects) {
Document doc = new Document();
document.add(new Field("id", generateDocId(object), idType));
document.add(new NumericDocValuesField("sortLong",<Long Value>));
documents.add(doc)
}
I am basically aggregating all the documents into a list and writing to index using
IndexWriter writer = getWriter();
writer.addDocuments(documents);
I am trying to query a few documents, based on some Query as well as Sort
Sort sort = new Sort(new SortField("sortLong", SortField.Type.LONG, true));
TopFieldDocs docs = searcher.search(new MatchAllDocsQuery(),2000,sort);
Problem:
When I use addDocuments to bulk index the documents, my Sort Queries are not returning the data in the correct Sort Order, basically they are wrong, however if I index each document using addDocument(), the Sort Queries are working correctly.
This has led me to deduce that there is something inherently wrong with addDocuments(). The sort wont work unless, I open the indexWriter, addDocument and Close the indexWriter. Which I am unwilling to do, because I have may thousands of records to index.
Is there any solution for this problem? Or is it a known defect.

Related

How can I get the highlights of my result set in Hibernate search 6?

I am using Hibernate search 6 Lucne backend in my java application.
There are various search operations I am performing including a fuzzy search.
I get search results without any issues.
Now I want to show what are the causes to pick each result in my result list.
Let's say the search keyword is "test", and the fuzzy search is performed in the fields "name", "description", "Id" etc. And I get 10 results in a List. Now I want to highlight the values in the fields of each result which caused that result to be a matching result.
eg: Consider the below to be one of the items in the search result List object. (for clarity I have written it in JSON format)
{
name:"ABC some test name",
description: "this is a test element",
id: "abc123"
}
As the result suggests it's been picked as a search result because the keyword "test" is there in both the fields "name" and the "description". I want to highlight those specific fields in the frontend when I show the search results.
Currently, I am retrieving search results through a java REST API to my Angular frontend. How can I get those specific fields and their values using Hibernate search 6 in my java application?
So far I have gone through Hibernate search 6 documentation and found nothing. (https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#preface) Also looked at what seemed to be related issues on the web over the past week and got nothing so far. It seems like m requirement is a little specific and that's why I need your help here.
Highlighting is not yet implemented in Hibernate Search, see HSEARCH-2192.
That being said, you can leverage native Elasticsearch / Lucene APIs.
With Elasticsearch it's relatively easy: you can use a request transformer to add a highlight element to the HTTP request, then use the jsonHit projection to retrieve the JSON for each hit, which contains a highlight element that includes the highlighted fields and the highlighted fragments.
With Lucene it would be more complex and you'll have to rely on unsupported features, but that's doable.
Retrieve the Lucene Query from your Hibernate Search predicate:
SearchPredicate predicate = ...;
Query query = LuceneMigrationUtils.toLuceneQuery(predicate);
Then do the highlighting: Hibernate search highlighting not analyzed fields may help with that, so that code uses an older version of Lucene and you might have to adapt it:
String highlightText(Query query, Analyzer analyzer, String fieldName, String text) {
QueryScorer queryScorer = new QueryScorer(query);
SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span>", "</span>");
Highlighter highlighter = new Highlighter(formatter, queryScorer);
return highlighter.getBestFragment(analyzer, fieldName, text);
}
You'll need to add a depdency to org.apache.lucene:lucene-highlighter.
To retrieve the analyzer, use the Hibernate Search metadata: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#backend-lucene-access-analyzers
So, connecting the dots... something like that?
Highlighter createHighlighter(SearchPredicate predicate, SearchScope<?> scope) {
// Taking a shortcut here to retrieve the index manager,
// since we already have the scope
// WARNING: This only works when searching a single index
Analyzer analyzer = scope.includedTypes().iterator().next().indexManager()
.unwrap( LuceneIndexManager.class )
.searchAnalyzer();
// WARNING: this method is not supported and might disappear in future versions of HSearch
Query query = LuceneMigrationUtils.toLuceneQuery(predicate);
QueryScorer queryScorer = new QueryScorer(query);
SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span>", "</span>");
return new Highlighter(formatter, queryScorer);
}
SearchSession searchSession = Search.session( entityManager );
SearchScope<Book> scope = searchSession.scope( Book.class );
SearchPredicate predicate = scope.predicate().match()
.fields( "title", "authors.name" )
.matching( "refactoring" )
.toPredicate();
Highlighter highlighter = createHighlighter(predicate, scope);
// Using Pair from Apache Commons, but others would work just as well
List<Pair<Book, String>> hits = searchSession.search( scope )
.select( select( f -> f.composite(
// Highlighting the title only, but you can do the same for other fields
book -> Pair.of( book, highlighter.getBestFragment(analyzer, "title", book.getTitle()))
f.entity()
) )
.where( predicate )
.fetch( 20 );
Not sure this compiles, but that should get you started.
Relatedly, but not exactly what you're asking for, there's an explain feature to get a sense of why a given hit has a given score: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#search-dsl-query-explain

MongoDB & Java DBRef Usage

So let’s say I have a patient document in MongoDB. It has things such as first name, last name, etc… I am trying to add to the current document a list of providers (which is another collection, by DBRef, as I am using POJO. How would I append multiple providers in Java to the document?
One way to do this is by just appending to a document like this:
Document doc = new Document("user", userObject)
.append("providers", providersObject);
providersObject would be your list of providers.

Couchbase Java SDK: N1QL queries that include document id

I'm looking to perform a query on my Couchbase database using the Java client SDK, which will return a list of results that include the document id for each result. Currently I'm using:
Statement stat = select("*").from(i("myBucket"))
.where(x(fieldIwantToGet).eq(s(valueIwantToGet)));
N1qlQueryResult result = bucket.query(stat);
However, N1qlQueryResult seems to only return a list of JsonObjects without any of the associated meta data. Looking at the documentation it seems like I want a method that returns a list of Document objects, but I can't see any bucket methods that I call that do the job.
Anyone know a way of doing this?
You need to use the below query to get Document Id:
Statement stat = select("meta(myBucket).id").from(i("myBucket"))
.where(x(fieldIwantToGet).eq(s(valueIwantToGet)));
The above would return you an array of Document Id.

Lucene : getting same length results when querying

So I've tried absolutely all query types in lucene and none of them seemed to work. What I'm trying to do is simple: I want to query the index but I wanna get exact same matches. And when I say exact same I mean the results should have the same text (obviously) AND the same length, something that usually happens when querying a database. So for example when I'm searching for jodie foster, I'm getting this text as one of the results : List of awards and nominations received by Jodie Foster. I don't want results containing the search term, I want results that are exactly like the search term.
First of all, this is how I'm building the lucene index:
IndexWriterConfig luceneConfig = new IndexWriterConfig(new StandardAnalyzer());
Path path = Paths.get("C:/Users/i_l_g/Desktop/DBpedia/qls_labels");
Directory dir = FSDirectory.open(path);
IndexWriter writer = new IndexWriter(dir, luceneConfig);
while (rs.next()) {
Document doc = new Document();
doc.add(new Field("entity", rs.getString("entity"), TextField.TYPE_STORED));
doc.add(new Field("label", rs.getString("label"), TextField.TYPE_STORED));
writer.addDocument(doc);
}
rs is a ResultSet type variable and I'm obviously just extracting data from a table and indexing them using Lucene.
Next, I tried querying this index using all types of queries, but I'm getting the same set of results every time, it's almost as if I didn't even change the query type. My last attempt was using a PhraseQuery:
StandardAnalyzer analyzer = new StandardAnalyzer();
PhraseQuery.Builder builder = new PhraseQuery.Builder();
PhraseQuery q;
builder.add(new Term("label","jodie"));
builder.add(new Term("label","foster"));
builder.setSlop(0);
q=builder.build();
This is the set of results I'm getting every single time, if it could be of any help:
Found 5 hits.
1. Jodie Foster
2. Alicia Christian "Jodie" Foster
3. Jodie Foster filmography
4. Impress Jodie Foster
5. List of awards and nominations received by Jodie Foster
I didn't really think that it would take me that much time, I've been trying to solve this issue for 2 days now and I've visited tens of links and there appears to be no one that has ever had this problem. Please help.

Fuzzy search on a phrase using Hibernate Search

I am using Hibernate Search to search for titles of tv shows on my web app.
I can use the method fuzzy() on keyword() in order to perfom fuzzy searches on keywords, but I need to take into account the whole title, so I am using phrase() instead of keyword(). The method fuzzy() is not defined for phrase(), so I was wondering if there is an easy way to achieve fuzzy searches on phrases using Hibernate Search.
If you just need a PhraseQuery with slop (that is, extra words thrown in), then you can set the slop on a phrase query like:
queryBuilder.phrase()
.setSlop(2)
.onField("myField")
.sentance("this sentence missing something")
.createQuery();
However, I'm not aware of anything in the Hibernate APIs that supports embedding fuzzy queries in phrases, but in Lucene, you can work with the SpanQuery API to build that. SpanMultiTermQueryWrapper and SpanNearQuery, in particular, are what you would need. Something like:
FuzzyQuery query1 = new FuzzyQuery(new Term("field", "fuzy"));
FuzzyQuery query2 = new FuzzyQuery(new Term("field", "phrse"));
Query wrappedQuery1 = new SpanMultiTermQueryWrapper<FuzzyQuery>(query1);
Query wrappedQuery2 = new SpanMultiTermQueryWrapper<FuzzyQuery>(query2);
SpanQuery[] clauses = {wrappedQuery1, wrappedQuery2};
SpanNearQuery(clauses, 0, true);

Categories