Getting only aggregations results in elasticsearch

Getting only aggregations results in elasticsearch - java

I am doing aggregation like this:
{
"size":0,
"aggs":
{
"my_aggs":
{
"terms":
{
"field":"my_field"
}
}
}
}
I want to get only the aggregation result. So when I set size=0 like below, it gives error-later learned this is for how many results I want(aggregations). So, how can I achieve this to get only the aggregation results(no hits result docs)
AbstractAggregationBuilder aggregationBuilder = AggregationBuilders
.terms("my_aggs")
.field("my_field")
.size(0) //how to set size for my purpose?
.order(BucketOrder.key(true));
Moreover, If I get thousands of aggregation results, does this query return all of them? or apply to its default 10 size? If not, how do know how many should I set size of aggregation result.
Edit I am adding my aggregation like this:
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices(indexName)
.withTypes(typeName)
.withQuery(boolQueryBuilder)
.addAggregation(aggregationBuilder)
.build();
Please help.

You do that on the SearchRequest instance not at the aggregation level:
AbstractAggregationBuilder aggregationBuilder = AggregationBuilders
.terms("my_aggs")
.field("my_field")
.order(BucketOrder.key(true));
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices(indexName)
.withTypes(typeName)
.withQuery(boolQueryBuilder)
.withPageable(new PageRequest(0, 1)) <--- add this
.addAggregation(aggregationBuilder)
.build();
As far as I know, Spring Data ES doesn't allow you to create a PageRequest with size 0 (hence why I picked 1). If that's a problem, this answer shows you how to override that behavior.
By default, your aggregation will return 10 buckets, but you can increase the size up to 10000, if needed.

Related

Elasticsearch + Java - Can you further optimize this query?

We have an elasticsearch that contains millions of records and we are using it for a global searching. However, our query takes 2-4 seconds to return result. Can somebody help or advice how to further optimize the following query:
NativeSearchQueryBuilder query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.boolQuery()
.should(QueryBuilders.termQuery("name", searchText))
.should(QueryBuilders.termQuery("key", searchText))
.should(QueryBuilders.termQuery("sort_alpha", searchText))
.should(QueryBuilders.termQuery("sort_number", searchText))
).withPageable(PageRequest.of(0, 100))
.withHighlightFields(new HighlightBuilder.Field("name"),
new HighlightBuilder.Field("key"),
new HighlightBuilder.Field("sort_alpha"),
new HighlightBuilder.Field("sort_number")
.build();

elasticsearch query on comparing 2 fields (using java)

I've an index in my elasticsearch and I want to have a query to compare 2 date fields.
assuming fields name are creationDate and modifiedDate. I want to get all documents which these 2 dates are the same in them.
I know it was possible to use FilteredQuery which is deprecated right now.
something like the bellowing code:
FilteredQueryBuilder query = QueryBuilders.filteredQuery(null,
FilterBuilders.scriptFilter("doc['creationDate'].value = doc['modifiedDate'].value"));
Also it's maybe possible to write manual scripts as string, but I doubt that this is the right solution. Any idea's to create the properly query would be appreciated.

Filtered query have been replaced by bool/filter queries You can do it like this:
BoolQueryBuilder bqb = QueryBuilders.boolQuery()
filter(QueryBuilders.scriptQuery("doc['creationDate'].value = doc['modifiedDate'].value"));
However, instead of using scripts at search time, you'd be better off creating a new field at indexing time that contains the information of whether creationDate and modifiedDate are the same dates. Then, you could simply check that flag at query time, it would be much more optimal and fast.
If you don't want to reindex all your data, you can update all of them with that flag, simply run an update by query like this:
POST my-index/_update_by_query
{
"script": {
"source": """
def creationDate = Instant.parse(ctx._source.creationDate);
def modifiedDate = Instant.parse(ctx._source.modifiedDate);
ctx._source.modified = ChronoUnit.MICROS.between(creationDate, modifiedDate) > 0;
""",
"lang": "painless"
},
"query": {
"match_all": {}
}
}
And then your query will simply be
BoolQueryBuilder bqb = QueryBuilders.boolQuery()
filter(QueryBuilders.termQuery("modified", "false");

Streaming the result of an aggregate operation using spring-data-mongodb

I am using spring-data-mongodb and I want to use a cursor for an aggregate operation.
MongoTemplate.stream() gets a Query, so I tried creating the Aggregation instance, convert it to a DbObject using Aggregation.toDbObject(), created a BasicQuery using the DbObject and then invoke the stream() method.
This returns an empty cursor.
Debugging the spring-data-mongodb code shows that MongoTemplate.stream() uses the FindOperation, which makes me thinkspring-data-mongodb does not support streaming an aggregation operation.
Has anyone been able to stream the results of an aggregate query using spring-data-mongodb?
For the record, I can do it using the Java mongodb driver, but I prefer using spring-data.
EDIT Nov 10th - adding sample code:
MatchOperation match = Aggregation.match(Criteria.where("type").ne("AType"));
GroupOperation group = Aggregation.group("name", "type");
group = group.push("color").as("colors");
group = group.push("size").as("sizes");
TypedAggregation<MyClass> agg = Aggregation.newAggregation(MyClass.class, Arrays.asList(match, group));
MongoConverter converter = mongoTemplate.getConverter();
MappingContext<? extends MongoPersistentEntity<?>, MongoPersistentProperty> mappingContext = converter.getMappingContext();
QueryMapper queryMapper = new QueryMapper(converter);
AggregationOperationContext context = new TypeBasedAggregationOperationContext(MyClass.class, mappingContext, queryMapper);
// create a BasicQuery to be used in the stream() method by converting the Aggregation to a DbObject
BasicQuery query = new BasicQuery(agg.toDbObject("myClass", context));
// spring-mongo attributes the stream() method to find() operationsm not to aggregate() operations so the stream returns an empty cursor
CloseableIterator<MyClass> iter = mongoTemplate.stream(query, MyClass.class);
// this is an empty cursor
while(iter.hasNext()) {
System.out.println(iter.next().getName());
}
The following code, not using the stream() method, returns the expected non-empty result of the aggregation:
AggregationResults<HashMap> result = mongoTemplate.aggregate(agg, "myClass", HashMap.class);

For those who are still trying to find the answer to this:
From spring-data-mongo version 2.0.0.M4 onwards (AFAIK) MongoTemplate got an aggregateStream method.
So you can do the following:
AggregationOptions aggregationOptions = Aggregation.newAggregationOptions()
// this is very important: if you do not set the batch size, you'll get all the objects at once and you might run out of memory if the returning data set is too large
.cursorBatchSize(mongoCursorBatchSize)
.build();
data = mongoTemplate.aggregateStream(Aggregation.newAggregation(
Aggregation.group("person_id").count().as("count")).withOptions(aggregationOptions), collectionName, YourClazz.class);

Spring Data MongoDB - sort by multiple fields does not work

I have a MongoRepository with the following method:
Position findFirstByDeviceIdAndSensorUsedIsIn(String deviceId, String[] sensorsUsed, Sort sort);
And i call it like this:
return posRepo.findFirstByDeviceIdAndSensorUsedIsIn(deviceId, VALID_SENSORS, new Sort(new Sort.Order(Sort.Direction.DESC, "Time"), new Sort.Order(Sort.Direction.DESC, "TimeReceived")));
VALID_SENSORS is a String-Array with 2 entries.
The problem now is, that it sorts by Time but the second dimension (TimeReceived) is random.
The TRACE-Output of the mongodb driver is:
com.mongodb.TRACE : find: company.position { "$query" : { "device_id" : "testId" , "sensor_used" : { "$in" : [ "CELL_LOCATE" , "GPS"]}} , "$orderby" : { "time" : -1 , "time_rcvd" : -1}}
When i try the following query with my mongoclient robomongo the order is correct. Here is the query:
db.getCollection('position').find({device_id:'testdevice'}).sort({time:-1,time_‌rcvd:-1}).limit(5)
What can cause this strange behavior?
EDIT:
I also tried the following code in my spring application:
TypedAggregation<Position> agg = newAggregation(Position.class,
match(where("deviceId").is(deviceId).andOperator(where("sensorUsed").in(VALID_SENSORS))),
sort(Sort.Direction.DESC, "time"),
sort(Sort.Direction.DESC, "timeReceived"),
limit(1)
);
AggregationResults<Position> result = template.aggregate(agg, Position.class);
But it does not work either! :(

You'll need to use the Aggregation framework, which was introduced in Mongo 2.2.
$unwind is the important operation, which duplicates each document in the aggregation pipeline, doing so once per array element. You'll want to unwind both time and time_rcvd, and then $sort them.

With the aggregation framework, you can do the following:
TypedAggregation<Position> agg = newAggregation(Position.class,
match(
where("deviceId").is(deviceId).andOperator(
where("sensorUsed").in(VALID_SENSORS)
)
),
sort(Sort.Direction.DESC, "time").and(Sort.Direction.DESC, "timeReceived"),
limit(1)
);
AggregationResults<Position> result = template.aggregate(agg, Position.class);

Grouping Solr results in Solr 3.6.1 API causes NullPointerException when parsing result

As long as I limit my query to:
SolrQuery solrQuery = new SolrQuery();
solrQuery.set("q", query); //where query is solr query string (e.g. *:*)
solrQuery.set("start", 0);
solrQuery.set("rows", 10);
everything works fine - results are returned and so on.
Things are getting worse when I try to group results by my field "Token_group" to avoid duplicates:
SolrQuery solrQuery = new SolrQuery();
solrQuery.set("q", query); //where query is solr query string (e.g. *:*)
solrQuery.set("start", 0);
solrQuery.set("rows", 10);
solrQuery.set("group", true);
solrQuery.set("group.field", "token_group");
solrQuery.set("group.ngroups", true);
solrQuery.set("group.limit", 20);
Using this results in HttpSolrServer no exceptions are being thrown, but trying to access results ends up in NPE.
My querying Solr method:
public SolrDocumentList query(SolrQuery query) throws SolrServerException {
QueryResponse response = this.solr.query(query); //(this.solr is handle to HttpSolrSelver)
SolrDocumentList list = response.getResults();
return list;
}
note that similar grouping (using the very same field) is made in our other apps (PHP) and works fine, so this is not a schema issue.

I solved my issue. In case someone needs this in future:
When you perform a group query, you should use different methods to get and parse results.
While in ungrouped queries
QueryResponse response = this.solr.query(query); //(this.solr is handle to HttpSolrSelver)
SolrDocumentList list = response.getResults();
will work, when you want to query for groups, it won't.
So, how do I make and parse query?
Below code for building query is perfectly fine:
SolrQuery solrQuery = new SolrQuery();
solrQuery.set("q", query); //where query is solr query string (e.g. *:*)
solrQuery.set("start", 0);
solrQuery.set("rows", 10);
solrQuery.set("group", true);
solrQuery.set("group.field", "token_group");
solrQuery.set("group.ngroups", true);
solrQuery.set("group.limit", 20);
where last four lines define that Solr should group results and parameters of grouping. In this case group.limit will define how many maximum results within a group you want, and rows will tell how many max results should be there.
Making grouped query looks like this:
List<GroupCommand> groupCommands = this.solr.query(query).getGroupResponse().getValues();
referring to documentation, GroupCommand contains info about grouping as well as list of results, divided by groups.
Okay, I want to get to the results. How to do it?
Well, in my example there's only one position in List<GroupCommand> groupCommands, so to get list of found groups within it:
GroupCommand groupCommand = groupCommands.get(0);
List<Group> groups = groupCommand.getValues();
This will result in list of groups. Each group contains its own SolrDocumentList. To get it:
for(Group g : groups){
SolrDocumentList groupList = g.getResult();
(...)
}
Having this, well just proceed with SolrDocumentList for each group.
I used grouping query to get list of distinct results. How to do it?
This was exacly my case. It seems easy but there's a tricky part that can catch you if you're refactoring already running code that uses getNumFound() from SolrDocumentList.
Just analyze my code:
/**
* Gets distinct resultlist from grouped query
*
* #param query
* #return results list
* #throws SolrServerException
*/
public SolrDocumentList queryGrouped(SolrQuery query) throws SolrServerException {
List<GroupCommand> groupCommands = this.solr.query(query).getGroupResponse().getValues();
GroupCommand groupCommand = groupCommands.get(0);
List<Group> groups = groupCommand.getValues();
SolrDocumentList list = new SolrDocumentList();
if(groups.size() > 0){
long totalNumFound = groupCommand.getNGroups();
int iteratorLimit = 1;
for(Group g : groups){
SolrDocumentList groupList = g.getResult();
list.add(groupList.get(0));
//I wanted to limit list to 10 records
if(iteratorLimit++ > 10){
break;
}
}
list.setNumFound(totalNumFound);
}
return list;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting only aggregations results in elasticsearch - java

Related

Elasticsearch + Java - Can you further optimize this query?

elasticsearch query on comparing 2 fields (using java)

Streaming the result of an aggregate operation using spring-data-mongodb

Spring Data MongoDB - sort by multiple fields does not work

Grouping Solr results in Solr 3.6.1 API causes NullPointerException when parsing result

Categories

Resources