Does anyone know what the simplest way of achieving a union in Elasticsearch?
I have many documents with a field disease_name and a lat/lon field.
e.g.
{
"disease_name": 'Flu',
"location": {
"lat": 41.12,
"lon": -71.34
}
}
I want to find all diseases that occur within all geo polygons.
(An OR query is not a suitable option as this may find any diseases that exist in just one area).
This is easy enough as a 'find one or more' because then it is just an OR query with multiple polygons and running the Elasticsearch terms aggregate.
EDIT - Additional Information about the use case
I wish to perform a UNION and INTERSECT.
Programmatically, forgetting ES for a moment, the way I would achieve this is as follows:
Assume I have a large array of documents
[
{
"disease_name": 'Flu',
"location": {
"lat": 41.12,
"lon": -71.34
}
},
...
]
Assume I have multiple geo buckets(geo polygons) defined.
For each disease, does it exist in all buckets, if not then throw away the document.
List all distinct disease_name values for the documents that remain.
I wish to avoid doing this programmatically if at all possible.
Make sure you have correct mapping (as geopoint for location),
I think this might help.
POST x/diseases/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": {
"filters": [
{
"geo_polygon": {
"location": {
"points": [
{
"lat": 40.73,
"lon": -74.1
},
{
"lat": 40.83,
"lon": -75.1
}
]
}
}
},
{
"geo_polygon": {
"location": {
"points": [
{
"lat": 40.73,
"lon": -74.1
},
{
"lat": 40.83,
"lon": -75.1
}
]
}
}
}
]
}
}
}
},
"aggs": {
"diseases": {
"terms": {
"field": "disease_name"
}
}
}
}
Here, you can change the coordinates and if you want add similar filter for other polygons.
Hope this helps!!
Related
If there are age and name fields
When I look up here, if the name is A, the age is 30 or more, and the others are 20 or more. In this way, I want to give different conditions depending on the field value.
Does es provide this function? I would like to know which keywords to use.
You may or may not be able to tell us how to use it with QueryBuilders provided by Spring.
Thank you.
select * from people
where (name = 'a' and age >=30) or age >=20
This site can convert sql to esdsl
try this
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match_phrase": {
"name": {
"query": "a"
}
}
},
{
"range": {
"age": {
"from": "30"
}
}
}
]
}
},
{
"range": {
"age": {
"from": "20"
}
}
}
]
}
},
"from": 0,
"size": 1
}
I have data in the following format:
[{
"OrderId": "406-5309498-5972326",
"revenueHeader": {
"Principal": 982.14,
"Product Tax": 117.86,
"Total": 1100
}},
{
"OrderId": "506-5234568-5934567",
"revenueHeader": {
"Principal": 382.54,
"Product Tax": 34.46,
"Shipping charge": 30.5,
"Giftwrap charge": 27.5,
"Total": 234
}}]
How can I sum the revenueHeader map values for the keys across all the documents?
Note: "Shipping charge" is not present in the first document but we still want the sum for this key across all the documents. Also, the keys can vary, there is no way to know the name of keys beforehand.
You have to use $objectToArray to transform revenueHeader in an array of {k,v} object.
You can then $unwind the array, and group by revenueHeader.k, summing revenueHeader.v . By this way, you never take care of the fields name or presence inside revenueHeader.
Here's the query :
db.collection.aggregate([
{
$project: {
revenueHeader: {
$objectToArray: "$revenueHeader"
}
}
},
{
$unwind: "$revenueHeader"
},
{
$group: {
_id: "$revenueHeader.k",
total: {
$sum: "$revenueHeader.v"
}
}
}
])
You can test it here
If you are not sure about how many fields are going to be there for which you want to get the sum of values across documents then you can go with the solution given by #matthPen.
But if you know all the possible fields present then you can use directly use the $group stage with key names. As I personally avoid using $unwind if number of documents is very large.
[
{
$group: {
_id: null,
"Principal": { $sum: "$revenueHeader.Principal" },
"Product Tax": { $sum: "$revenueHeader.Product Tax" },
"Shipping charge": { $sum: "$revenueHeader.Shipping charge" },
"Giftwrap charge": { $sum: "$revenueHeader.Giftwrap charge" },
"Total": { $sum: "$revenueHeader.Total" },
}
}
]
I have someone putting JSON objects into Elasticsearch for which I do not know any fields. I would like to search all the fields for a given value using a matchQuery.
I understand that the _all is deprecated, and the copy_to doesn't work because I don't know what fields are available beforehand. Is there a way to accomplish this without knowing what fields to search for beforehand?
Yes, you can achieve this using a custom _all field (which I called my_all) and a dynamic template for your index. Basically, this idea is to have a generic mapping for all fields with a copy_to setting to the my_all field. I've also added store: true for the my_all field but only for the purpose of showing you that it works, in practice you won't need it.
So let's go and create the index:
PUT my_index
{
"mappings": {
"_doc": {
"dynamic_templates": [
{
"all_fields": {
"match": "*",
"mapping": {
"copy_to": "my_all"
}
}
}
],
"properties": {
"my_all": {
"type": "text",
"store": true
}
}
}
}
}
Then index a document:
PUT my_index/_doc/1
{
"test": "the cat drinks milk",
"age": 10,
"alive": true,
"date": "2018-03-21T10:00:00.123Z",
"val": ["data", "data2", "data3"]
}
Finally, we can search using the my_all field and also show its content (because we store its content) in addition to the _source of the document:
GET my_index/_search?q=my_all:cat&_source=true&stored_fields=my_all
And the result is shown below:
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"test": "the cat drinks milk",
"age": 10,
"alive": true,
"date": "2018-03-21T10:00:00.123Z",
"val": [
"data",
"data2",
"data3"
]
},
"fields": {
"my_all": [
"the cat drinks milk",
"10",
"true",
"2018-03-21T10:00:00.123Z",
"data",
"data2",
"data3"
]
}
}
So given you can create the index and mapping of your index, you'll be able to search whatever people are sending to it.
I've recently taken up Jayway JsonPath and I've had trouble with how the inpath filtering works.
So my JSON looks like this:
At the top I have shareables. These shareables have an array called user, which contains an ID and a name, and they also contain an item called dataset, which can contain any json.
These shareables can exist within the dataset as well.
My working JSON looks like this:
{
"shareable": {
"user": [
{
"ID": 1,
"Name": "Bob"
},
{
"ID": 2,
"Name": "Charles"
}
],
"dataSet": [
{
"insulinMeasurement":
{
"timestamp": "Tuesday Morning",
"measurement": 174,
"unit": "pmol/L"
}
},
{
"insulinMeasurement":
{
"timestamp": "Tuesday Noon",
"measurement": 80,
"unit": "pmol/L"
}
},
{ "shareable": {
"user": [
{
"ID": 3,
"Name": "Jim"
}
],
"dataSet": [
{
"insulinMeasurement":
{
"timestamp": "Tuesday Evening",
"measurement": 130,
"unit": "pmol/L"
}
}
]
}
},
{ "unshareable": {
"user": [
{
"ID": 2,
"Name": "Bob"
}
],
"dataSet": [
{
"insulinMeasurement":
{
"timestamp": "Tuesday Night",
"measurement": 130,
"unit": "pmol/L"
}
}
]
}
}
]
}
}
So what I want is, all shareables that have a user with a certain ID. So I figured the path I would use would look like this:
$..shareable[ ?(#.user[*].ID == 1 )]
which here has a hardcoded ID. This returns nothing while
$..shareable[ ?(#.user[0].ID == 1 )]
returns any shareable where the first ID is 1.
I also tried something along the lines of
$..shareable[ ?(#.user[?(#.ID == 1)]
which I figure should return any shareable that has a user with an ID of 1.
Am I going about this the wrong way? Do I need to somehow iterate through the user objects that exist?
Well, I figured it out, so if anyone stumbles across this, the query should look as follows:
$..shareable[?( " + user + " in #.user[*].ID )]
where user is just the int of the userId. Basically the right hand side creates a list of all IDs that shareable contains, and checks if the requested ID exists therein.
Currently we are searching through elastic with multiple requests.
What I want is that, if, for instance, you have an index of fruits, with data "calories", "name" and "family", I want top 3 (calory based) fruits with family "a", top 3 with "b" and top 3 with "c".
Currently I would search 3 times, making a query look like:
{
"sort": [ {"calories": "desc"} ],
"query": {
"bool" : {
"must": [
{"term": { "family": "a" }} // second time "b", third time "c"...
]
}
},
"from": 0,
"size": 3
}
Using QueryBuilders.boolQuery().must(QueryBuilders.termQuery("family", "a"));
(Being that the query above would be in a loop, so second time it's "b", third time "c")
My question is if I can somehow do a functionality similar to UNION from SQL? Joining 3 results with family "a", 3 with family "b" and 3 with family "c". Also how would this be done in Java (Spring Boot) would be very helpful!
Thanks! If the description/explanation isn't good, please tell me, I'll try to elaborate.
You could perform a multi-search and do the UNION in Java (this is the better way so you can rank results easily).
Or, use a bool should query to do OR clauses.
"bool" : {
"should": [
{"term": { "family": "a" }},
{"term": { "family": "b" }},
{"term": { "family": "c" }}
]
}
BUT it's hard to control how many results by family.
So another solution is to use a terms aggregation + top_hits:
(https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html)
{
"query": {
"match_all": {}
},
"aggs": {
"family": {
"terms": {
"field": "family"
},
"aggs": {
"top_sales_hits": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"date",
"price"
]
},
"size": 10
}
}
}
}
}
}
Note: this is just an example, not a working solution.