How can implement First_value and Last_Value SQL functions in Elasticsearch? - java

Elasticsearch JavaAPI supports the AggregationBuilder for sum, min, max, avg, and count. So what about First/First_value and Last/Last_value how can implement these functions.
Here is the reference to the documentation, https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/_metrics_aggregations.html

You may use an aggregator for your Elasticsearch query
"aggs": {
"FIRST_VALUE": {
"top_hits": {
"size": 1,
"sort": [
{
//The string below measures or assess which among the column would you like to arrange in order,
//for this example, we presume that there's a "my_date" field in your index
"my_date": {
"order": "asc"
}
}
]
}
},
"LAST_VALUE": {
"top_hits": {
"size": 1,
"sort": [
{
"my_date": {
"order": "desc"
}
}
]
}
}
}

Related

how to search case condition in elastic search

If there are age and name fields
When I look up here, if the name is A, the age is 30 or more, and the others are 20 or more. In this way, I want to give different conditions depending on the field value.
Does es provide this function? I would like to know which keywords to use.
You may or may not be able to tell us how to use it with QueryBuilders provided by Spring.
Thank you.
select * from people
where (name = 'a' and age >=30) or age >=20
This site can convert sql to esdsl
try this
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match_phrase": {
"name": {
"query": "a"
}
}
},
{
"range": {
"age": {
"from": "30"
}
}
}
]
}
},
{
"range": {
"age": {
"from": "20"
}
}
}
]
}
},
"from": 0,
"size": 1
}

How to get data from mongodb with duplicated parameters?

I have try to create criteria that fetch from data base items.
Here is the code that fetches items from mongo db:
public List<Location> findByListOfId(List<String> locationsIds){
Query query = new Query();
query.addCriteria(Criteria.where("id").in(locationsIds));
return template.find(query, Location.class);
}
here is Location class defenition:
#Document("loaction")
#Data
public class Location {
#Id
private String id;
private long order;
private Date createdAt;
private Date updatedAt;
}
And here is the value of input(List locationsIds) in findByListOfId function:
List<String> locationsIds = {"5d4eee8047206b6d2df212bb","5d4eee8047206b6d2df212bb","5d4eee8047206b6d2df212bb"}
as you can see the input contains the same value three times.
The result that I get from findByListOfId function is a single item with id equal to 5d4eee8047206b6d2df212bb,
while I need to get the numbers of items with the same id as a number of times that exists with in variable(in my case I expect 3 fetched items with id = 5d4eee8047206b6d2df212bb ).
Any idea how this query can be created?
Not sure why you want to do it, but you can do it this way (in Mongo Query Language, you can then translate it in Java).
MongoDB Playground
db.collection.aggregate([
{
$match: {
key: {
$in: [
"5d4eee8047206b6d2df212bb",
"5d4eee8047206b6d2df212bb",
"5d4eee8047206b6d2df212bb"
]
}
}
},
{
"$addFields": {
"itemsArray": [
"5d4eee8047206b6d2df212bb",
"5d4eee8047206b6d2df212bb",
"5d4eee8047206b6d2df212bb"
]
}
},
{
"$unwind": "$itemsArray"
},
])
Using aggregation pipeline, you will add the array as a field using $addFields and then $unwind it (will give you x number of times).
I agree with others it's not something you want to do in production code, but I find the question interesting.
#Yahya's answer works with an assumption that the $match stage returns exactly 1 document.
The more generic pipeline to fetch exact number of documents regardless of how unique the key is and how many duplicates are in the query https://mongoplayground.net/p/546QnaFn4lV :
db.collection.aggregate([
{
$limit: 1
},
{
$project: {
_id: 1,
list: [
"5d4eee8047206b6d2df212bb",
"5d4eee8047206b6d2df212bb",
"6d4eee8047206b6d2df212bc",
"7d4eee8047206b6d2df212bd"
]
}
},
{
"$unwind": "$list"
},
{
"$lookup": {
"from": "collection",
"localField": "list",
"foreignField": "key",
"as": "match"
}
},
{
$project: {
match: {
$cond: [
{
$eq: [
"$match",
[]
]
},
[
{
_id: null,
"key": "$list"
}
],
"$match"
]
}
}
},
{
"$replaceWith": {
$first: "$match"
}
}
])
The first $project passes the list of requested ids to mongo.
The last $project stage returns "null" for requested ids that don't have a matching document.
Here is an aggregate query with required result:
Consider a collection with these documents:
{ _id: 1, a: 11 }
{ _id: 2, a: 22 }
{ _id: 3, a: 99 }
The query in mongo shell with input documents:
var INPUT_IDS = [ 1, 2, 1, 1 ]
db.collection.aggregate([
{
$match: {
_id: { $in: INPUT_IDS }
}
},
{
$group: {
_id: null,
docs: { $push: "$$ROOT" }
}
},
{
$project: {
docs: {
$map: {
input: INPUT_IDS,
as: "inid",
in: {
$let: {
vars: {
matched: {
$filter: {
input: "$docs", as: "doc", cond: { $eq: [ "$$inid", "$$doc._id" ] }
}
}
},
in: { $arrayElemAt: [ "$$matched", 0 ] }
}
}
}
}
}
},
{
$unwind: "$docs"
},
{
$replaceWith: "$docs"
}
])
The output:
{ "_id" : 1, "a" : 11 }
{ "_id" : 2, "a" : 22 }
{ "_id" : 1, "a" : 11 }
{ "_id" : 1, "a" : 11 }

How to aggregate MongoDB the final total sum? From the sum calculated earlier

How to aggregate the final total sum? From the sum calculated earlier
this is original result.
[
{
"name": "a",
"prices": 10,
},
{
"name": "a",
"prices": 20,
}
]
but i need to do this.
[
{
"name": "a",
"prices": 10,
},
{
"name": "a",
"prices": 20,
},
//i need to do more//
{
"name": "total",
"total":30
}
]
this is example picture.
enter image description here
$group by null and construct array of root documents in docs, get total price in totalPrices
concat current docs and total prices doc using $concatArrays
$unwind deconstruct docs array
$project to show both the fields from docs object
db.collection.aggregate([
{
$group: {
_id: null,
docs: { $push: "$$ROOT" },
totalPrices: { $sum: "$prices" }
}
},
{
$project: {
docs: {
$concatArrays: [
"$docs",
[
{
name: "total",
prices: "$totalPrices"
}
]
]
}
}
},
{ $unwind: "$docs" },
{
$project: {
_id: 0,
name: "$docs.name",
prices: "$docs.prices"
}
}
])
Playground

Union searches through elasticsearch and spring

Currently we are searching through elastic with multiple requests.
What I want is that, if, for instance, you have an index of fruits, with data "calories", "name" and "family", I want top 3 (calory based) fruits with family "a", top 3 with "b" and top 3 with "c".
Currently I would search 3 times, making a query look like:
{
"sort": [ {"calories": "desc"} ],
"query": {
"bool" : {
"must": [
{"term": { "family": "a" }} // second time "b", third time "c"...
]
}
},
"from": 0,
"size": 3
}
Using QueryBuilders.boolQuery().must(QueryBuilders.termQuery("family", "a"));
(Being that the query above would be in a loop, so second time it's "b", third time "c")
My question is if I can somehow do a functionality similar to UNION from SQL? Joining 3 results with family "a", 3 with family "b" and 3 with family "c". Also how would this be done in Java (Spring Boot) would be very helpful!
Thanks! If the description/explanation isn't good, please tell me, I'll try to elaborate.
You could perform a multi-search and do the UNION in Java (this is the better way so you can rank results easily).
Or, use a bool should query to do OR clauses.
"bool" : {
"should": [
{"term": { "family": "a" }},
{"term": { "family": "b" }},
{"term": { "family": "c" }}
]
}
BUT it's hard to control how many results by family.
So another solution is to use a terms aggregation + top_hits:
(https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html)
{
"query": {
"match_all": {}
},
"aggs": {
"family": {
"terms": {
"field": "family"
},
"aggs": {
"top_sales_hits": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"date",
"price"
]
},
"size": 10
}
}
}
}
}
}
Note: this is just an example, not a working solution.

elasticsearch - Return the tokens of a field

How can I have the tokens of a particular field returned in the result
For example, A GET request
curl -XGET 'http://localhost:9200/twitter/tweet/1'
returns
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_source" : {
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}
}
I would like to have the tokens of '_source.message' field included in the result
There is also another way to do it using the following script_fields script:
curl -H 'Content-Type: application/json' -XPOST 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "message"
}
}
}
}'
It's important to note that while this script returns the actual terms that were indexed, it also caches all field values and on large indices can use a lot of memory. So, on large indices, it might be more useful to retrieve field values from stored fields or source and reparse them again on the fly using the following MVEL script:
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import java.io.StringReader;
// Cache analyzer for further use
cachedAnalyzer=(isdef cachedAnalyzer)?cachedAnalyzer:doc.mapperService().documentMapper(doc._type.value).mappers().indexAnalyzer();
terms=[];
// Get value from Fields Lookup
//val=_fields[field].values;
// Get value from Source Lookup
val=_source[field];
if(val != null) {
tokenStream=cachedAnalyzer.tokenStream(field, new StringReader(val));
CharTermAttribute termAttribute = tokenStream.addAttribute(CharTermAttribute);
while(tokenStream.incrementToken()) {
terms.add(termAttribute.toString())
};
tokenStream.close();
}
terms
This MVEL script can be stored as config/scripts/analyze.mvel and used with the following query:
curl 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "analyze",
"params": {
"field": "message"
}
}
}
}'
If you mean the tokens that have been indexed you can make a terms facet on the message field. Increase the size value in order to get more entries back, or set to 0 to get all terms.
Lucene provides the ability to store the term vectors, but there's no way to have access to it with elasticsearch by now (as far as I know).
Why do you need that? If you only want to check what you're indexing you can have a look at the analyze api.
Nowadays, it's possible with the Term vectors API:
curl http://localhost:9200/twitter/_termvectors/1?fields=message
Result:
{
"_index": "twitter",
"_id": "1",
"_version": 1,
"found": true,
"took": 0,
"term_vectors": {
"message": {
"field_statistics": {
"sum_doc_freq": 4,
"doc_count": 1,
"sum_ttf": 4
},
"terms": {
"elastic": {
"term_freq": 1,
"tokens": [
{
"position": 2,
"start_offset": 11,
"end_offset": 18
}
]
},
"out": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 7,
"end_offset": 10
}
]
},
"search": {
"term_freq": 1,
"tokens": [
{
"position": 3,
"start_offset": 19,
"end_offset": 25
}
]
},
"trying": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 6
}
]
}
}
}
}
}
Note: Mapping types (here: tweets) have been removed in Elasticsearch 8.x (see migration guide).

Categories