how to search case condition in elastic search - java

If there are age and name fields
When I look up here, if the name is A, the age is 30 or more, and the others are 20 or more. In this way, I want to give different conditions depending on the field value.
Does es provide this function? I would like to know which keywords to use.
You may or may not be able to tell us how to use it with QueryBuilders provided by Spring.
Thank you.

select * from people
where (name = 'a' and age >=30) or age >=20
This site can convert sql to esdsl
try this
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match_phrase": {
"name": {
"query": "a"
}
}
},
{
"range": {
"age": {
"from": "30"
}
}
}
]
}
},
{
"range": {
"age": {
"from": "20"
}
}
}
]
}
},
"from": 0,
"size": 1
}

Related

Scala - Ids lists of objects with duplicated values from spark dataset

I need to create an IDs lists for all objects that have identical (same value and quantity) parameters. I am looking for a solution that will be more efficient than two nested loops and an if.
Object structure in the dataset:
case class MergedProduct(id: String,
products: List[Product])
case class Product(productUrl: String, productId: String)
Example of data in dataset:
[ {
"id": "ID1",
"products": [
{
"product": {
"productUrl": "SOMEURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID2",
"products": [
{
"product": {
"productUrl": "SOMEURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID3",
"products": [
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID4",
"products": [
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
}
],
},
{
"id": "ID5",
"products": [
{
"product": {
"productUrl": "NOTDUPLICATEDURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
}
],
}
]
In this example, we have 4 objects that are duplicated, so I would like to get their ID in the corresponding lists.
Example output is List[List[String]]:
List(List("ID1", "ID2"), List("ID3","ID4"))
I am looking for something efficient and readable - the dataset we are talking about has nearly 700 million objects.
As I can remove the listed duplicates from the dataset (it does not affect the database) because the goal is one - logging them exists, so I was thinking about the solution of taking MergedProduct one by one, searching for other MergedProduct with identical Products, getting their ID, logging in they exist and then remove the mentioned MergedProduct ID from the dataset and move on to the next one until I check the whole dataset but in this case I would have to collect it first as a list of MergedProducts and then do all operations - seems like going around
After trying some options and looking for neat solutions- I think this is kinda ok:
private def getDuplicates(mergedProducts: List[MergedProduct]): List[List[String]] = {
val duplicates = mergedProducts.groupBy(_.products.sortBy(_.product.productId)).filter(_._2.size > 1).values.toList
duplicates.map(duplicates => duplicates.map(_.id))
}

java : search for substring in elasticsearch

I'm trying to look for substrings in the elasticsearch, but what I've come to known and what I've coded doesn't exactly look for a substring like the way I want.
Here's what I've coded :
BoolQueryBuilder query = new BoolQueryBuilder();
query.must(new QueryStringQueryBuilder("tagName : *"+tagName+"*"));
SearchResponse response = esclient.prepareSearch(index).setTypes(type)
.setQuery(query)
.execute().actionGet();
SearchHit[] hits = response.getHits().getHits();
for (SearchHit hit : hits) {
Map map = hit.getSource();
list.add((String) map.get("tagName"));
}
list = list.stream().distinct().collect(Collectors.toList());
for(int i = 0; i < list.size(); i++) {;
jsonArrayBuilder.add((String) list.get(i));
}
What I'm trying to implement is to look even if part of the given tagname matches with anything should be listed.
But in case, for ex : if I'm looking for a tag named "social_security_number" and I type "social security" then I would like it to be listed.
But what's actually happening is if I miss the underscore, it's not getting listed.
Is it possible to be done? Should I modify this code to search that way?
Here is my index structure :
POST arempris/emptagnames
{
"mappings" : {
"emptags":{
"properties": {
"employeeid": {
"type":"integer"
},
"tagName": {
"type": "text",
"fielddata": true,
"analyzer": "lowercase_keyword",
"search_analyzer": "lowercase_keyword"
}
}
}
}
}
Would greatly appreciate for your help and thanks a lot in advance.
The analyzer that you have set does not tokenize anything, so the space is important. Specifying a custom analyzer that will split on whitespaces and underscores and anything you might find useful is a good solution. The below will work, but check really carefully what the analyzer does and visit the documentation for every part you don't understand.
PUT stackoverflow
{
"settings": {
"analysis": {
"analyzer": {
"customanalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"standard",
"generatewordparts"
]
}
},
"filter": {
"generatewordparts": {
"type": "word_delimiter",
"split_on_numerics": false,
"split_on_case_change": false,
"generate_word_parts": true,
"generate_number_parts": false,
"stem_english_possessive": false,
"catenate_all": false
}
}
}
},
"mappings": {
"emptags": {
"properties": {
"employeeid": {
"type": "integer"
},
"tagName": {
"type": "text",
"fielddata": true,
"analyzer": "customanalyzer",
"search_analyzer": "customanalyzer"
}
}
}
}
}
GET stackoverflow/emptags/1
{
"employeeid": 1,
"tagName": "social_security_number"
}
GET stackoverflow/_analyze
{
"analyzer" : "customanalyzer",
"text" : "social_security_number123"
}
GET stackoverflow/_search
{
"query": {
"query_string": {
"default_field": "tagName",
"query": "*curi*"
}
}
}
Another solution would be to normalize your input and replace any symbol that you want to treat as a whitespace (e.g. underscore) with a whitespace.
Read here for more
http://nocf-www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-normalizers.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.html

Jayway JsonPath filtering with predicates

I've recently taken up Jayway JsonPath and I've had trouble with how the inpath filtering works.
So my JSON looks like this:
At the top I have shareables. These shareables have an array called user, which contains an ID and a name, and they also contain an item called dataset, which can contain any json.
These shareables can exist within the dataset as well.
My working JSON looks like this:
{
"shareable": {
"user": [
{
"ID": 1,
"Name": "Bob"
},
{
"ID": 2,
"Name": "Charles"
}
],
"dataSet": [
{
"insulinMeasurement":
{
"timestamp": "Tuesday Morning",
"measurement": 174,
"unit": "pmol/L"
}
},
{
"insulinMeasurement":
{
"timestamp": "Tuesday Noon",
"measurement": 80,
"unit": "pmol/L"
}
},
{ "shareable": {
"user": [
{
"ID": 3,
"Name": "Jim"
}
],
"dataSet": [
{
"insulinMeasurement":
{
"timestamp": "Tuesday Evening",
"measurement": 130,
"unit": "pmol/L"
}
}
]
}
},
{ "unshareable": {
"user": [
{
"ID": 2,
"Name": "Bob"
}
],
"dataSet": [
{
"insulinMeasurement":
{
"timestamp": "Tuesday Night",
"measurement": 130,
"unit": "pmol/L"
}
}
]
}
}
]
}
}
So what I want is, all shareables that have a user with a certain ID. So I figured the path I would use would look like this:
$..shareable[ ?(#.user[*].ID == 1 )]
which here has a hardcoded ID. This returns nothing while
$..shareable[ ?(#.user[0].ID == 1 )]
returns any shareable where the first ID is 1.
I also tried something along the lines of
$..shareable[ ?(#.user[?(#.ID == 1)]
which I figure should return any shareable that has a user with an ID of 1.
Am I going about this the wrong way? Do I need to somehow iterate through the user objects that exist?
Well, I figured it out, so if anyone stumbles across this, the query should look as follows:
$..shareable[?( " + user + " in #.user[*].ID )]
where user is just the int of the userId. Basically the right hand side creates a list of all IDs that shareable contains, and checks if the requested ID exists therein.

Union searches through elasticsearch and spring

Currently we are searching through elastic with multiple requests.
What I want is that, if, for instance, you have an index of fruits, with data "calories", "name" and "family", I want top 3 (calory based) fruits with family "a", top 3 with "b" and top 3 with "c".
Currently I would search 3 times, making a query look like:
{
"sort": [ {"calories": "desc"} ],
"query": {
"bool" : {
"must": [
{"term": { "family": "a" }} // second time "b", third time "c"...
]
}
},
"from": 0,
"size": 3
}
Using QueryBuilders.boolQuery().must(QueryBuilders.termQuery("family", "a"));
(Being that the query above would be in a loop, so second time it's "b", third time "c")
My question is if I can somehow do a functionality similar to UNION from SQL? Joining 3 results with family "a", 3 with family "b" and 3 with family "c". Also how would this be done in Java (Spring Boot) would be very helpful!
Thanks! If the description/explanation isn't good, please tell me, I'll try to elaborate.
You could perform a multi-search and do the UNION in Java (this is the better way so you can rank results easily).
Or, use a bool should query to do OR clauses.
"bool" : {
"should": [
{"term": { "family": "a" }},
{"term": { "family": "b" }},
{"term": { "family": "c" }}
]
}
BUT it's hard to control how many results by family.
So another solution is to use a terms aggregation + top_hits:
(https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html)
{
"query": {
"match_all": {}
},
"aggs": {
"family": {
"terms": {
"field": "family"
},
"aggs": {
"top_sales_hits": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"date",
"price"
]
},
"size": 10
}
}
}
}
}
}
Note: this is just an example, not a working solution.

Elasticsearch geo point union or similar

Does anyone know what the simplest way of achieving a union in Elasticsearch?
I have many documents with a field disease_name and a lat/lon field.
e.g.
{
"disease_name": 'Flu',
"location": {
"lat": 41.12,
"lon": -71.34
}
}
I want to find all diseases that occur within all geo polygons.
(An OR query is not a suitable option as this may find any diseases that exist in just one area).
This is easy enough as a 'find one or more' because then it is just an OR query with multiple polygons and running the Elasticsearch terms aggregate.
EDIT - Additional Information about the use case
I wish to perform a UNION and INTERSECT.
Programmatically, forgetting ES for a moment, the way I would achieve this is as follows:
Assume I have a large array of documents
[
{
"disease_name": 'Flu',
"location": {
"lat": 41.12,
"lon": -71.34
}
},
...
]
Assume I have multiple geo buckets(geo polygons) defined.
For each disease, does it exist in all buckets, if not then throw away the document.
List all distinct disease_name values for the documents that remain.
I wish to avoid doing this programmatically if at all possible.
Make sure you have correct mapping (as geopoint for location),
I think this might help.
POST x/diseases/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": {
"filters": [
{
"geo_polygon": {
"location": {
"points": [
{
"lat": 40.73,
"lon": -74.1
},
{
"lat": 40.83,
"lon": -75.1
}
]
}
}
},
{
"geo_polygon": {
"location": {
"points": [
{
"lat": 40.73,
"lon": -74.1
},
{
"lat": 40.83,
"lon": -75.1
}
]
}
}
}
]
}
}
}
},
"aggs": {
"diseases": {
"terms": {
"field": "disease_name"
}
}
}
}
Here, you can change the coordinates and if you want add similar filter for other polygons.
Hope this helps!!

Categories