I'm writing a Java application that is using Apache Solr to index and search through a list of articles. A requirement I am dealing with is that when a user searches for something, we are supplying a list of recommended related search terms, and the user has the option to include those extra terms in their search. The problem I'm having, however, is that we want the user's original search term to be prioritized, and results that match that should appear before results that only match related terms.
My research suggests that Solr's boost function is the solution for this, but I'm having some trouble getting it to work with Spring. The code all runs fine and I get my search results as expected, but the boost function doesn't seem to actually be re-ordering my searches at all. For example, I'm trying to do something like this:
Query query = new SimpleQuery();
Criteria searchCriteria = Criteria.where("title").contains("A").boost((float) 2);
Criteria extraCriteria = Criteria.where("title").contains("B").boost((float) 1);
query.addCriteria(searchCriteria.or(extraCriteria));
In this example I would be searching for any document whose title contains "A" or "B", but I want to boost results that match "A" to the top of the list.
I've also tried using the Extended DisMax Query Parser with a different syntax to achieve the same result, with similar lack of success. To follow the same example pattern, I'm trying to use the expression criteria as follows:
Query query = new SimpleQuery();
Criteria searchCriteria = Criteria.where("title").expression("A^2.0 OR B^1.0");
query.setDefType("edismax");
query.addCriteria(searchCriteria);
Again I would expect this to return documents with titles matching "A" or "B" but boost results matching "A", and again it simply doesn't seem to actually affect the ordering of my results at all.
Okay, I figured out the problem here. Elsewhere in the code someone else had added this snippet:
query.setPageRequest(pageable);
This was done to support pagination of the search results, but the pageable object ALSO contained some sort orders that looks like they got added to the query as part of the .setPageRequest method. Something to look out for in the future, it looks like sorts override boosting when working with Spring Solr queries in this scenario.
Related
I created a lucene indexes for set of data and trying to retrieve results from that.
When I do a boolean query with SHOULD, lucene returns me expected result.
eg: (title:"america")
But on the other hand when I do a MUST_NOT query, it returns me empty results even though there are lot of data which satisfy this criteria.
(-title:"america")
I think I am doing some silly mistake but not able to figure it out so far. Could someone please give some pointers.
Understood the issue. I should combine MUSt NoT with some other operators.
Quote from https://www.bookdepository.com/Lucene-in-Action-Erik-Hatcher/9781933988177?redirected=true&utm_medium=Google&utm_campaign=Base3&utm_source=BE&utm_content=Lucene-in-Action&selectCurrency=EUR&w=AF4UAU960P6LMLA8VCZZ&gclid=Cj0KCQjwuL_8BRCXARIsAGiC51C8OdXsVpJbYRfodiFcGFEl2FKylqh2MvBjnHs9T5fVfMmDzZXbU4oaAisFEALw_wcB
Placing a NOT in front of a term excludes documents matching the following term.
Negating a term must be combined with at
least one non-negated term to return docu-
ments; in other words, it isn’t possible to
use a query like NOT term to find all docu-
ments that don’t contain a term.
I am using solrj api for fetch result from solr.
my query is like this:
solrQuery.addFilterQuery("connection:(${user.uniqueKey()}) OR followers:
(${user.uniqueKey()}) OR company:(${currentCompanies})")
I want that result first which met maximum criteria (from connection,followers, company)
i.e. if any result which fall into connection and followers and company then result should come first before that result which fall into connection, followers only.
You might actually be better off not using a filter query so you can do something like this:
(yourField1:value1 OR yourField2:value2 OR yourField3:value3) OR (yourField1:value1 AND yourField2:value2 AND yourField3:value3)^100.0
You'll need to play with it a little to get the right values.
What you're doing here is telling Solr to score documents higher on an AND search, but to still return results that fit the OR search.
I'm integrating Hibernate Search in my project and at the moment it works fine.
Now I want to refine my search in this way: basically I'd like to pass, as a user, a query like term1 AND term2 OR term3 and so on. The number of terms could be different of course.
So my idea is to build a proper search with logical operators to help the users to find what they want to.
You have to separate your conditions which are using AND and OR by using ().
e.g.
(term1 AND term2) OR term3
If you are again wanted to use some term the it should be like ((term1 AND term2) OR term3) AND term4 like this.....
You can use this stackoverflow answer if you have one entity.
You can use a boolean query like :
Query luceneQuery = b.bool()
.must(b.keyword().onField("fieldName").matching("term1").createQuery())
.must(b.keyword().onField("fieldName").matching("term2").createQuery())
.should(b.keyword().onField("fieldName").matching("term3").createQuery())
.except(b.keyword().onField("fieldName").matching("term4").createQuery())
.createQuery();
must : the query must much this term (like AND).
should : the query should this query (like OR).
except : to exclude the document that contains this term (like NOT).
I am new to mongodb and I am trying to sort all my rows by date. I have records from mixed sources and I trying to sort it separately. I didn't update the dateCreated while writing into db for some records. Later I found and I added dateCreated to all my records in the db. Say I have total of 4000 records, first 1000 I don't have dateCreated. Latest 3000 has that column. Here I am trying to get the last Updated record using dateCreated column. Here is my code.
db.person.find({"source":"Naukri"}&{dateCreated:{$exists:true}}).sort({dateCreated: 1}).limit(10)
This code retruns me some results (from that 1000 records) where I can't see that dateCreated column at all. Moreover if I change (-1) here {dateCreated: -1} I am getting results from some other source, but not Naukri.
So I need help this cases,
How do I sort by dateCreated to get the latest updated record and by sources also.
I am using Java API to get the records from Mongo. I'd be grateful if someone helps me to find how I will use the same query with java also.
Hope my question is clear. Thanks in advance.
From the documentation you will (and you will, won't you - nod yes) read, you will find that the first argument to the find command you are using is what is called a query document. In this document you are specifying a list of fields and conditions, "comma" separated, which is the equivalent of an and condition in declarative syntax such as SQL.
The problem with your query is it was not valid, and did not match anything. The correct syntax would be as follows:
db.person.find({"source":"Naukri", dateCreated:{$exists:true}})
.sort({dateCreated: -1})
.limit(10)
So now this will filter by the value provided for "source" and where the "dateCreated" field exists, meaning it is there and it contains something.
I recommend looking at the links below, the first of the two concerned with structuring mongoDB queries and the find method and it's arguments. All of the functionality translates to every language implementation.
As for the Java API and how to use, there are different methods depending on which you are comfortable with. The API provides a BasicDBObject class which is more or less equivalent to the JSON document notation, and is sort of a hashmap concept. For something a bit more along the lines of the shell methods and a helper to be a little more like some of the dynamic languages approach, there is the QueryBuilder class which the last two links give example and information on. These allow chaining to make your query more readable.
There are many examples on Stack Overflow alone. I suggest you take a look.
http://docs.mongodb.org/manual/tutorial/query-documents/
http://docs.mongodb.org/manual/reference/method/db.collection.find/
How to do this MongoDB query using java?
http://api.mongodb.org/java/2.2/com/mongodb/QueryBuilder.html
Your query is not correct.Update it as follows :
db.person.find({"source":"Naukri", dateCreated:{$exists:true}}).sort({dateCreated: 1}).limit(10)
In Java, you can do it as follows :
Mongo mongo = ...
DB db = mongo.getDB("yourDbName");
DBCollection coll = db.getCollection("person");
DBObject query = new BasicDBObject();
query.put("source", "Naukri");
query.put("dateCreated", new BasicDBObject($exists : true));
DBCursor cur = coll.find(query).sort(new BasicDBObject("dateCreated", 1)).limit(10);
while(cur.hasNext()) {
DBObject obj = cur.next();
// Get data from the result object here
}
I have indexed my database using Hibernate Search. I use a custom analyzer, both for indexing and for querying. I have a field called inchikey that should not get tokenized. Example values are:
BBBAWACESCACAP-UHFFFAOYSA-N
KEZLDSPIRVZOKZ-AUWJEWJLSA-N
When I look into my index with Luke I can confirm that they are not tokenized, as required.
However, when I try to search them using the web app, some inchikeys are found and others are not. Curiously, for these inchikeys the search DOES work when I search without the last hyphen, as so: BBBAWACESCACAP-UHFFFAOYSA N
I have not been able to find a common element in the inchikeys that are not found.
Any idea what is going on here?
I use a MultiFieldQueryParser to search over the different fields in the database:
String[] searchfields = Compound.getSearchfields();
MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_29, Compound.getSearchfields(), new ChemicalNameAnalyzer());
//Disable the following if search performance is too slow
parser.setAllowLeadingWildcard(true);
FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(parser.parse("searchterms"), Compound.class);
List<Compound> hits = fullTextQuery.list();
More details about our setup have been posted here by Tim and I.
It turns out the last entries in the input file are not being indexed correctly. These ARE being tokenized. In fact, it seems they are indexed twice: once without being tokenized and once with. When I search I cannot find the un-tokenized.
I have not yet found the reason, but I think it perhaps has to do with our parser ending while Lucene is still indexing the last entries, and as a result Lucene reverting to the default analyzer (StandardAnalyzer). When I find the culprit I will report back here.
Adding #Analyzer(impl = ChemicalNameAnalyzer.class) to the fields solves the problem, but what I want is my original setup, with the default analyzer defined once, in config, like so:
<property name="hibernate.search.analyzer">path.to.ChemicalNameAnalyzer</property>