Order results according to most matched Criteria-solr - java

I am using solrj api for fetch result from solr.
my query is like this:
solrQuery.addFilterQuery("connection:(${user.uniqueKey()}) OR followers:
(${user.uniqueKey()}) OR company:(${currentCompanies})")
I want that result first which met maximum criteria (from connection,followers, company)
i.e. if any result which fall into connection and followers and company then result should come first before that result which fall into connection, followers only.

You might actually be better off not using a filter query so you can do something like this:
(yourField1:value1 OR yourField2:value2 OR yourField3:value3) OR (yourField1:value1 AND yourField2:value2 AND yourField3:value3)^100.0
You'll need to play with it a little to get the right values.
What you're doing here is telling Solr to score documents higher on an AND search, but to still return results that fit the OR search.

Related

How to query DynamoDB and return the first matching result instead of querying the whole collection?

We have some old code that is doing a query of the DynamoDB to find list of matching records.
Sample code below:
final DynamoDBQueryExpression<MyObject> queryExp = new DynamoDBQueryExpression<MyObject>()
.withHashKeyValues(myObject)
.withIndexName(indexName)
.withScanIndexForward(false)
.withConsistentRead(true)
.withLimit(rowsPerPage);
final PaginatedQueryList<MyObject> ruleInstanceList = dynamoDBMapper.query(MyObject.class, queryExp);
This is a slow operation since this query will return a list of matching MyObject, and I noticed all we used it for is to check if this list is empty or not.
So what I want to do is simply doing the query to find the first element or even a different type of query to simply make sure the count is greater than 0, all I need to verify is that the record exist so that I can reduce the latency.
My question is, how do I do it in order to achieve this?
The documentation for getLimit() indicates:
Note that when calling DynamoDBMapper.query, multiple requests are made to DynamoDB if needed to retrieve the entire result set. Setting this will limit the number of items retrieved by each request, NOT the total number of results that will be retrieved. Use DynamoDBMapper.queryPage to retrieve a single page of items from DynamoDB.
To limit the number of results, you can use queryPage() instead of query(). And apply withLimit(1) to your query expression.

Apache Solr, Require multiple of a single field while optimizing query to only hit specific indexes

My data is partitioned inside solr so that when I send a request "+apple" (required apple) I only hit partition 'a' to search.
Because of this optimization I cannot easily use boolean logic that spans all my data.
Solr query: +fruit:bananna +fruit:apple
Result: there are no fruit with both fields so i get 0 results because I am searching the 'a' partition AND 'b' partition each with both required fields. In this case it is very unlikely that a record with the fruit field has two names, however this is a multi-valued field, so it is possible and I want those records to be at the top of my result from solr.
One way to get what I want would be to change the query to: fruit:bananna fruit:apple
However... this will sometimes return results that are neither apple nor bananna because solr marks both as optional thus I allow it to search all my indexes. For example:
fruit:bananna fruit:apple Country:Mexico
This might return oranges in Mexico... in which case I would rather get 0 results.
Also, doing two separate queries is not an option...does anyone know of a better way to get this 'REQUIRED OR' functionality with my partition optimization?
I am also open to other designs, i'm just looking for input.

Using Apache Solr's boost query function with Spring in Java

I'm writing a Java application that is using Apache Solr to index and search through a list of articles. A requirement I am dealing with is that when a user searches for something, we are supplying a list of recommended related search terms, and the user has the option to include those extra terms in their search. The problem I'm having, however, is that we want the user's original search term to be prioritized, and results that match that should appear before results that only match related terms.
My research suggests that Solr's boost function is the solution for this, but I'm having some trouble getting it to work with Spring. The code all runs fine and I get my search results as expected, but the boost function doesn't seem to actually be re-ordering my searches at all. For example, I'm trying to do something like this:
Query query = new SimpleQuery();
Criteria searchCriteria = Criteria.where("title").contains("A").boost((float) 2);
Criteria extraCriteria = Criteria.where("title").contains("B").boost((float) 1);
query.addCriteria(searchCriteria.or(extraCriteria));
In this example I would be searching for any document whose title contains "A" or "B", but I want to boost results that match "A" to the top of the list.
I've also tried using the Extended DisMax Query Parser with a different syntax to achieve the same result, with similar lack of success. To follow the same example pattern, I'm trying to use the expression criteria as follows:
Query query = new SimpleQuery();
Criteria searchCriteria = Criteria.where("title").expression("A^2.0 OR B^1.0");
query.setDefType("edismax");
query.addCriteria(searchCriteria);
Again I would expect this to return documents with titles matching "A" or "B" but boost results matching "A", and again it simply doesn't seem to actually affect the ordering of my results at all.
Okay, I figured out the problem here. Elsewhere in the code someone else had added this snippet:
query.setPageRequest(pageable);
This was done to support pagination of the search results, but the pageable object ALSO contained some sort orders that looks like they got added to the query as part of the .setPageRequest method. Something to look out for in the future, it looks like sorts override boosting when working with Spring Solr queries in this scenario.

IN Equivalent Query In Solr and Solrj

I am using solr5.0.0. I would like to know the equivalent query for
IN in solr or solrj.
If I need to query products of different brands, I can use IN clause. If I have brands like dell, sony, samsung. I need to find the product with these brands using Solr and in Java Solrj.
Now I am using this code in Solrj
qry.addFilterQuery("brand:dell OR brand:sony OR brand:samsung");
I know that I can use OR here, but need to know about IN in Solr. And the performance of OR.
As you can read in Solr's wiki about its' query syntax, Solr uses per default a superset of Lucene's Query parser. As you can see when reading both documents, something like IN does not exist. But you can get shorter than the example query you presented.
In case that your default operator is OR you can leave it out from the query. In addition you can make use of Field Grouping.
qry.addFilterQuery("brand:(dell sony samsung)");
In case OR is not your default operator or you are not sure about this, you can employ Local Parameters for the filter query so that OR is enforced. Afterwards you can again make use of Field Grouping.
qry.addFilterQuery("{!q.op=OR}brand:(dell sony samsung)");
Keep in mind that you need to surround a phrase with " to keep the words together
qry.addFilterQuery("{!q.op=OR}brand:(dell sony samsung \"packard bell\")");

Implementing result paging in hibernate (getting total number of rows)

How do I implement paging in Hibernate? The Query objects has methods called setMaxResults and setFirstResult which are certainly helpful. But where can I get the total number of results, so that I can show link to last page of results, and print things such as results 200 to 250 of xxx?
You can use Query.setMaxResults(int results) and Query.setFirstResult(int offset).
Editing too: There's no way to know how many results you'll get. So, first you must query with "select count(*)...". A little ugly, IMHO.
You must do a separate query to get the max results...and in the case where between time A of the first time the client issues a paging request to time B when another request is issued, if new records are added or some records now fit the criteria then you have to query the max again to reflect such. I usually do this in HQL like this
Integer count = (Integer) session.createQuery("select count(*) from ....").uniqueResult();
for Criteria queries I usually push my data into a DTO like this
ScrollableResults scrollable = criteria.scroll(ScrollMode.SCROLL_INSENSITIVE);
if(scrollable.last()){//returns true if there is a resultset
genericDTO.setTotalCount(scrollable.getRowNumber() + 1);
criteria.setFirstResult(command.getStart())
.setMaxResults(command.getLimit());
genericDTO.setLineItems(Collections.unmodifiableList(criteria.list()));
}
scrollable.close();
return genericDTO;
you could perform two queries - a count(*) type query, which should be cheap if you are not joining too many tables together, and a second query that has the limits set. Then you know how many items exists but only grab the ones being viewed.
You can do one thing. just prepare Criteria query as per your busness requirement with all Predicates , sorting , searching etc.
and then do as below :-
CriteriaBuilder criteriaBuilder = em.getCriteriaBuilder();
CriteriaQuery<Feedback> criteriaQuery = criteriaBuilder.createQuery(Feedback.class);
//Just Prepare your all Predicates as per your business need.
//eg :-
yourPredicateAsPerYourBusnessNeed = criteriaBuilder.equal(Root.get("applicationName"), applicationName);
criteriaQuery.where(yourPredicateAsPerYourBusnessNeed).distinct(true);
TypedQuery<Feedback> criteriaQueryWithPredicate = em.createQuery(criteriaQuery);
//Getting total Count Here
Long totalCount = criteriaQueryWithPredicate.getResultStream().distinct().count();
Now we have our actual data with us as above with total count , right.
So now we can apply pagination on the data we have in our hand above , as below :-
List<Feedback> feedbackList = criteriaQueryWithPredicate.setFirstResult(offset).setMaxResults(pageSize).getResultList();
Now You can prepare a wrapper with your List return by DB along with the totalCount , startingPageNo that is offset here in this case, page Size etc and can return to your service / controller class.
I am 101 % sure , this will solve your problem, Because I was facing same problem and sorted it out same way.
Thanks- Sunil Kumar Mali
You can just setMaxResults to the maximum number of rows you want returned. There is no harm in setting this value greater than the number of actual rows available. The problem the other solutions is they assume the ordering of records remains the same each repeat of the query, and there are no changes going on between commands.
To avoid that if you really want to scroll through results, it is best to use the ScrollableResults. Don't throw this object away between paging, but use it to keep the records in the same order. To find out the number of records from the ScrollableResults, you can simply move to the last() position, and then get the row number. Remember to add 1 to this value, since row numbers start counting at 0.
I personally think you should handle the paging in the front-end. I know this isn't that efficiƫnt but at least it would be less error prone.
If you would use the count(*) thing what would happen if records get deleted from the table in between requests for a certain page? Lots of things could go wrong this way.

Categories