Elasticsearch + Java - Can you further optimize this query? - java

We have an elasticsearch that contains millions of records and we are using it for a global searching. However, our query takes 2-4 seconds to return result. Can somebody help or advice how to further optimize the following query:
NativeSearchQueryBuilder query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.boolQuery()
.should(QueryBuilders.termQuery("name", searchText))
.should(QueryBuilders.termQuery("key", searchText))
.should(QueryBuilders.termQuery("sort_alpha", searchText))
.should(QueryBuilders.termQuery("sort_number", searchText))
).withPageable(PageRequest.of(0, 100))
.withHighlightFields(new HighlightBuilder.Field("name"),
new HighlightBuilder.Field("key"),
new HighlightBuilder.Field("sort_alpha"),
new HighlightBuilder.Field("sort_number")
.build();

Related

Lucene queryParser builds the query correctly but search doesn't wrok

I have indexed IntPointField using lucene which I am able to fetch using below query:
Query query = IntPoint.newRangeQuery("field1", 0, 40);
TopDocs topDocs = searcher.search(query);
System.out.println(topDocs.totalHits);
its fetching the relevant correctly.
If i build the query using parse it doesn't work
Query query = new QueryParser(Version.LUCENE_8_11_0.toString(), new StandardAnalyzer()).parse("field1:[0 TO 40]");
I checked the string representation of both the query they look identical as below
field1:[0 TO 40]
Does anyone know what am I doing wrong?
IntPoint field requires custom query paser.
The below solves the problem
StandardQueryParser parser = new StandardQueryParser();
parser.setAnalyzer(new StandardAnalyzer());
PointsConfig indexableFields = new PointsConfig(new DecimalFormat(), Integer.class);
Map<String, PointsConfig> indexableFieldMap = new HashMap<>();
pointsConfigMap.put("field1", indexableFields);
parser.setPointsConfigMap(indexableFieldMap);

Getting only aggregations results in elasticsearch

I am doing aggregation like this:
{
"size":0,
"aggs":
{
"my_aggs":
{
"terms":
{
"field":"my_field"
}
}
}
}
I want to get only the aggregation result. So when I set size=0 like below, it gives error-later learned this is for how many results I want(aggregations). So, how can I achieve this to get only the aggregation results(no hits result docs)
AbstractAggregationBuilder aggregationBuilder = AggregationBuilders
.terms("my_aggs")
.field("my_field")
.size(0) //how to set size for my purpose?
.order(BucketOrder.key(true));
Moreover, If I get thousands of aggregation results, does this query return all of them? or apply to its default 10 size? If not, how do know how many should I set size of aggregation result.
Edit I am adding my aggregation like this:
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices(indexName)
.withTypes(typeName)
.withQuery(boolQueryBuilder)
.addAggregation(aggregationBuilder)
.build();
Please help.
You do that on the SearchRequest instance not at the aggregation level:
AbstractAggregationBuilder aggregationBuilder = AggregationBuilders
.terms("my_aggs")
.field("my_field")
.order(BucketOrder.key(true));
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices(indexName)
.withTypes(typeName)
.withQuery(boolQueryBuilder)
.withPageable(new PageRequest(0, 1)) <--- add this
.addAggregation(aggregationBuilder)
.build();
As far as I know, Spring Data ES doesn't allow you to create a PageRequest with size 0 (hence why I picked 1). If that's a problem, this answer shows you how to override that behavior.
By default, your aggregation will return 10 buckets, but you can increase the size up to 10000, if needed.

JEST Bulk Request Issue

I am trying to run a Bulk Request through JEST and want to append my data (say "bills") one at a time and then execute all at once, however when i run the following code on 10 bills just the last bill is getting executed, can someone please correct this code to execute all 10 bills (by executing it outside the for loop ie using Bulk Request)?
for(JSONObject bill : bills) {
bulkRequest = new Bulk.Builder()
.addAction(new Index.Builder(bill.toString()).index(index).type(type).id(id).build())
.build();
}
bulkResponse = Client.execute(bulkRequest);
You need to build the Bulk Builder out of the loop and then use it to add all bills:
bulkRequest = new Bulk.Builder()
for(JSONObject bill : bills) {
bulkRequest.addAction(new Index.Builder(bill.toString()).index(index).type(type).id(id).build())
}
bulkResponse = Client.execute(bulkRequest.build());
I know It's an old question, but just in case someone stumbles across this, here is a java 8/(lambdas) way of doing the same thing.
Client.execute( new Bulk.Builder()
.addAction(
bills.stream()
.map(bill ->
new Index.Builder(bill.toString()
)
.index(index).type(type).id(id).build())
.collect(Collectors.toList())
).build());

Elasticsearch Java API - Bool Query Operator

I am using Elasticsearch 2.4.3 in my Spring Boot App and use following Query
QueryBuilder qb = new BoolQueryBuilder()
.must(QueryBuilders.multiMatchQuery(term, "phoneticFirstName", "phoneticLastName", "phoneticLocationName", "phoneticCompanyName")
.analyzer("atsCustomSearchAnalyzer")
.operator(Operator.AND))
.must(QueryBuilders.multiMatchQuery(term, "ngramFirstName^3", "ngramLastName^3", "ngramLocationName^3", "ngramCompanyName^3", "_all")
.analyzer("atsCustomSearchAnalyzer")
.operator(Operator.AND));
I want to get a response, where the first Query or the second Query get hits.... can you help me to change that in my Code, please?
UPDATE
"atsCustomPhoneticAnalyzer":{
"type":"custom",
"tokenizer":"whitespace",
"filter":["lowercase","asciifolding","atsPhoneticFilter"]
},
"atsCustomSearchAnalyzer":{
"type":"custom",
"tokenizer":"whitespace",
"filter":["lowercase","asciifolding","umlautStemmer","germanStemmer"]
}
UPDATE #2
QueryBuilder qb = new BoolQueryBuilder()
.should(QueryBuilders.multiMatchQuery(term, "ngramFirstName", "ngramLastName", "ngramLocationName", "ngramCompanyName")
.type(Type.CROSS_FIELDS)
.analyzer("atsCustomSearchAnalyzer")
.operator(Operator.AND)
.boost(3))
.should(QueryBuilders.multiMatchQuery(term, "phoneticLastName")
.analyzer("atsCustomPhoneticAnalyzer")
.operator(Operator.AND))
.should(QueryBuilders.matchQuery(term, "_all")
.analyzer("atsCustomSearchAnalyzer")
.operator(Operator.AND))
.minimumNumberShouldMatch(1);
I have 2 indices: persons and activities. When I comment out the second query I get Hits from persons and activities. If all 3 queries are present the hits from activities are not there anymore....
Any ideas?
Simply change must with should instead and add minimumShouldMatch(1)
QueryBuilder qb = new BoolQueryBuilder()
.minimumNumberShouldMatch(1)
.should(QueryBuilders.multiMatchQuery(term, "phoneticFirstName", "phoneticLastName", "phoneticLocationName", "phoneticCompanyName")
.analyzer("atsCustomSearchAnalyzer")
.operator(Operator.AND))
.should(QueryBuilders.multiMatchQuery(term, "ngramFirstName^3", "ngramLastName^3", "ngramLocationName^3", "ngramCompanyName^3", "_all")
.analyzer("atsCustomSearchAnalyzer")
.operator(Operator.AND));

Grouping Solr results in Solr 3.6.1 API causes NullPointerException when parsing result

As long as I limit my query to:
SolrQuery solrQuery = new SolrQuery();
solrQuery.set("q", query); //where query is solr query string (e.g. *:*)
solrQuery.set("start", 0);
solrQuery.set("rows", 10);
everything works fine - results are returned and so on.
Things are getting worse when I try to group results by my field "Token_group" to avoid duplicates:
SolrQuery solrQuery = new SolrQuery();
solrQuery.set("q", query); //where query is solr query string (e.g. *:*)
solrQuery.set("start", 0);
solrQuery.set("rows", 10);
solrQuery.set("group", true);
solrQuery.set("group.field", "token_group");
solrQuery.set("group.ngroups", true);
solrQuery.set("group.limit", 20);
Using this results in HttpSolrServer no exceptions are being thrown, but trying to access results ends up in NPE.
My querying Solr method:
public SolrDocumentList query(SolrQuery query) throws SolrServerException {
QueryResponse response = this.solr.query(query); //(this.solr is handle to HttpSolrSelver)
SolrDocumentList list = response.getResults();
return list;
}
note that similar grouping (using the very same field) is made in our other apps (PHP) and works fine, so this is not a schema issue.
I solved my issue. In case someone needs this in future:
When you perform a group query, you should use different methods to get and parse results.
While in ungrouped queries
QueryResponse response = this.solr.query(query); //(this.solr is handle to HttpSolrSelver)
SolrDocumentList list = response.getResults();
will work, when you want to query for groups, it won't.
So, how do I make and parse query?
Below code for building query is perfectly fine:
SolrQuery solrQuery = new SolrQuery();
solrQuery.set("q", query); //where query is solr query string (e.g. *:*)
solrQuery.set("start", 0);
solrQuery.set("rows", 10);
solrQuery.set("group", true);
solrQuery.set("group.field", "token_group");
solrQuery.set("group.ngroups", true);
solrQuery.set("group.limit", 20);
where last four lines define that Solr should group results and parameters of grouping. In this case group.limit will define how many maximum results within a group you want, and rows will tell how many max results should be there.
Making grouped query looks like this:
List<GroupCommand> groupCommands = this.solr.query(query).getGroupResponse().getValues();
referring to documentation, GroupCommand contains info about grouping as well as list of results, divided by groups.
Okay, I want to get to the results. How to do it?
Well, in my example there's only one position in List<GroupCommand> groupCommands, so to get list of found groups within it:
GroupCommand groupCommand = groupCommands.get(0);
List<Group> groups = groupCommand.getValues();
This will result in list of groups. Each group contains its own SolrDocumentList. To get it:
for(Group g : groups){
SolrDocumentList groupList = g.getResult();
(...)
}
Having this, well just proceed with SolrDocumentList for each group.
I used grouping query to get list of distinct results. How to do it?
This was exacly my case. It seems easy but there's a tricky part that can catch you if you're refactoring already running code that uses getNumFound() from SolrDocumentList.
Just analyze my code:
/**
* Gets distinct resultlist from grouped query
*
* #param query
* #return results list
* #throws SolrServerException
*/
public SolrDocumentList queryGrouped(SolrQuery query) throws SolrServerException {
List<GroupCommand> groupCommands = this.solr.query(query).getGroupResponse().getValues();
GroupCommand groupCommand = groupCommands.get(0);
List<Group> groups = groupCommand.getValues();
SolrDocumentList list = new SolrDocumentList();
if(groups.size() > 0){
long totalNumFound = groupCommand.getNGroups();
int iteratorLimit = 1;
for(Group g : groups){
SolrDocumentList groupList = g.getResult();
list.add(groupList.get(0));
//I wanted to limit list to 10 records
if(iteratorLimit++ > 10){
break;
}
}
list.setNumFound(totalNumFound);
}
return list;
}

Categories