Union searches through elasticsearch and spring

Union searches through elasticsearch and spring - java

Currently we are searching through elastic with multiple requests.
What I want is that, if, for instance, you have an index of fruits, with data "calories", "name" and "family", I want top 3 (calory based) fruits with family "a", top 3 with "b" and top 3 with "c".
Currently I would search 3 times, making a query look like:
{
"sort": [ {"calories": "desc"} ],
"query": {
"bool" : {
"must": [
{"term": { "family": "a" }} // second time "b", third time "c"...
]
}
},
"from": 0,
"size": 3
}
Using QueryBuilders.boolQuery().must(QueryBuilders.termQuery("family", "a"));
(Being that the query above would be in a loop, so second time it's "b", third time "c")
My question is if I can somehow do a functionality similar to UNION from SQL? Joining 3 results with family "a", 3 with family "b" and 3 with family "c". Also how would this be done in Java (Spring Boot) would be very helpful!
Thanks! If the description/explanation isn't good, please tell me, I'll try to elaborate.

You could perform a multi-search and do the UNION in Java (this is the better way so you can rank results easily).
Or, use a bool should query to do OR clauses.
"bool" : {
"should": [
{"term": { "family": "a" }},
{"term": { "family": "b" }},
{"term": { "family": "c" }}
]
}
BUT it's hard to control how many results by family.
So another solution is to use a terms aggregation + top_hits:
(https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html)
{
"query": {
"match_all": {}
},
"aggs": {
"family": {
"terms": {
"field": "family"
},
"aggs": {
"top_sales_hits": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"date",
"price"
]
},
"size": 10
}
}
}
}
}
}
Note: this is just an example, not a working solution.

Related

how to search case condition in elastic search

If there are age and name fields
When I look up here, if the name is A, the age is 30 or more, and the others are 20 or more. In this way, I want to give different conditions depending on the field value.
Does es provide this function? I would like to know which keywords to use.
You may or may not be able to tell us how to use it with QueryBuilders provided by Spring.
Thank you.

select * from people
where (name = 'a' and age >=30) or age >=20
This site can convert sql to esdsl
try this
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match_phrase": {
"name": {
"query": "a"
}
}
},
{
"range": {
"age": {
"from": "30"
}
}
}
]
}
},
{
"range": {
"age": {
"from": "20"
}
}
}
]
}
},
"from": 0,
"size": 1
}

Scala - Ids lists of objects with duplicated values from spark dataset

I need to create an IDs lists for all objects that have identical (same value and quantity) parameters. I am looking for a solution that will be more efficient than two nested loops and an if.
Object structure in the dataset:
case class MergedProduct(id: String,
products: List[Product])
case class Product(productUrl: String, productId: String)
Example of data in dataset:
[ {
"id": "ID1",
"products": [
{
"product": {
"productUrl": "SOMEURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID2",
"products": [
{
"product": {
"productUrl": "SOMEURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID3",
"products": [
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID4",
"products": [
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
}
],
},
{
"id": "ID5",
"products": [
{
"product": {
"productUrl": "NOTDUPLICATEDURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
}
],
}
]
In this example, we have 4 objects that are duplicated, so I would like to get their ID in the corresponding lists.
Example output is List[List[String]]:
List(List("ID1", "ID2"), List("ID3","ID4"))
I am looking for something efficient and readable - the dataset we are talking about has nearly 700 million objects.
As I can remove the listed duplicates from the dataset (it does not affect the database) because the goal is one - logging them exists, so I was thinking about the solution of taking MergedProduct one by one, searching for other MergedProduct with identical Products, getting their ID, logging in they exist and then remove the mentioned MergedProduct ID from the dataset and move on to the next one until I check the whole dataset but in this case I would have to collect it first as a list of MergedProducts and then do all operations - seems like going around

After trying some options and looking for neat solutions- I think this is kinda ok:
private def getDuplicates(mergedProducts: List[MergedProduct]): List[List[String]] = {
val duplicates = mergedProducts.groupBy(_.products.sortBy(_.product.productId)).filter(_._2.size > 1).values.toList
duplicates.map(duplicates => duplicates.map(_.id))
}

ElasticSearch range query in a paragraph

I have a field called Description which is a text field and has data like:
This is a good thing for versions before 3.2 but bad for 3.5 and later
I want to run range query on this type of text. I know that for a field containing only Dates/Age(Numbers) or even String Ids, we can use queries like
{
"query": {
"range" : {
"age" : {
"gte" : 10,
"lte" : 20,
"boost" : 2.0
}
}
}
}
But i have a mixed field like mentioned above and I need to perform range query on that. Also, i cannot change the index structure. I can only perform queries or do some post processing after retrieving results. So anyone has any idea how to run this type of query, or even obtain my goal after getting results in the post processing? I am using Java.

I hope i fully understand what you are looking for.
I've managed to create a simple working example.
Mappings
Using char_group tokenizer:
The char_group tokenizer breaks text into terms whenever it encounters a character which is in a defined set. It is mostly useful for cases where a simple custom tokenization is desired, and the overhead of use of the pattern tokenizer is not acceptable.
Char Group Tokenizer
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "char_group",
"tokenize_on_chars": [
"letter",
"whitespace"
]
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"fields": {
"digit": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
}
Post a few documents
PUT my_index/_doc/1
{
"text": "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
PUT my_index/_doc/2
{
"text": "This is a good thing for versions before 5 but bad for 6 and later"
}
Search Query
GET my_index/_search
{
"query": {
"range": {
"text.digit": {
"gte": 3.2,
"lte": 3.5
}
}
}
}
Results
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"text" : "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
}
]
}
Another Search Query
GET my_index/_search
{
"query": {
"range": {
"text.digit": {
"gt": 3.5
}
}
}
}
Results
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"text" : "This is a good thing for versions before 5 but bad for 6 and later"
}
}
]
}
Analyze Query
Play with the following query till you get the desired results.
It is already compatible to your example.
This is a good thing for versions before 3.2 but bad for 3.5 and later
POST _analyze
{
"tokenizer": {
"type": "char_group",
"tokenize_on_chars": [
"letter",
"whitespace"
]
},
"text": "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
Hope this helps

Mongo query with multiple filters on same attribute doesn't work with JAVA

I'm new to mongoDb and created a query that fulfills my functional requirements:
db.collection.find(
{
"typeD": "ABC"
, "typeT": {$size: 2}
, "typeT": { $all: ["def", "abc"] }
, "typeC": { $size: 3}
, "typeC": { $all: ["pdf", "video", "png"] }
, "properties": {$size: 3}
,"properties":
{
$all:
[
{"$elemMatch": {"name": "propName1", "value": "proName1_value"} }
, {"$elemMatch": {"name": "propName2", "value": "proName2_value"} }
, {"$elemMatch": {"name": "propName3", "value": "proName3_value"} }
]
}
);
I want to find the documents that exactly contains the elements provided by the arrays - as a fixed order of the elements inside arrays cannot be assumed, I've chosen the $all operator and to ensure exact matching I added to additional restriction with the $size.
Above query can be executed on mongo shell without any problems.
While trying to execute this statement with java by using mongoTemplate, I get some problems:
BasicQuery query = new BasicQuery(queryString);
CollectionEntity existingCmc = this.mongoTemplate.find(query, CollectionEntity.class);
After the first java line, query.toString() provides:
db.collection.find(
{
"typeD": "ABC"
, "typeT": { $all: ["def", "abc"] }
, "typeC": { $all: ["pdf", "video", "png"] }
,"properties":
{
$all:
[
{"$elemMatch": {"name": "propName1", "value": "proName1_value"} }
, {"$elemMatch": {"name": "propName2", "value": "proName2_value"} }
, {"$elemMatch": {"name": "propName3", "value": "proName3_value"} }
]
}
);
How can I execute the query that fulfills all of my requirements?
Can I rewrite the query so that one "single condition per attribute" is in the query?
How can I tell mongoTemplate, not to "overwrite" the previous condition for this attribute?
Thanks in advance

MongoDB queries are actually Maps (Key-Value pairs).
Since you are defining 'properties' key two times which is why it is getting overridden.
Your desired query can be done using '$and' operator.
{ "$and": [
{ "properties": { "$size": 3 } },
{ "properties": { $all:
[
{"$elemMatch": {"name": "propName1", "value": "proName1_value"} },
{"$elemMatch": {"name": "propName2", "value": "proName2_value"} },
{"$elemMatch": {"name": "propName3", "value": "proName3_value"} }
]
} }
] }

ElasticSearch - fuzzy search java api results are not proper

I have indexed sample documents in elasticsearch and trying to search using fuzzy query. But am not getting any results when am search by using Java fuzzy query api.
Please find my below mapping script :
PUT productcatalog
{
"settings": {
"analysis": {
"analyzer": {
"attr_analyzer": {
"type": "custom",
"tokenizer": "letter",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding", "stemmer_minimal_english"]
}
},
"filter" : {
"stemmer_minimal_english" : {
"type" : "stemmer",
"name" : "minimal_english"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"values": {
"type": "text",
"analyzer": "attr_analyzer"
},
"catalog_type": {
"type": "text"
},
"catalog_id":{
"type": "long"
}
}
}
}
}
Please find my sample data.
PUT productcatalog/doc/1
{
"catalog_id" : "343",
"catalog_type" : "series",
"values" : "Activa Rooftop, valves, VG3000, VG3000FS, butterfly, ball"
}
PUT productcatalog/doc/2
{
"catalog_id" : "12717",
"catalog_type" : "product",
"values" : "Activa Rooftop, valves"
}
Please find my search script :
GET productcatalog/_search
{
"query": {
"match" : {
"values" : {
"query" : " activa rooftop VG3000",
"operator" : "and",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
}
}
Am getting the below results for the above query :
{
"took": 239,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.970927,
"hits": [
{
"_index": "productcatalog",
"_type": "doc",
"_id": "1",
"_score": 0.970927,
"_source": {
"catalog_id": "343",
"catalog_type": "series",
"values": "Activa Rooftop, valves, VG3000, VG3000FS, butterfly, ball"
}
}
]
}
}
But if i use the below Java API for the same fuzzy search am not getting any results out of it.
Please find my below Java API query for fuzzy search :
QueryBuilder qb = QueryBuilders.boolQuery()
.must(QueryBuilders.fuzzyQuery("values", keyword).boost(1.0f).prefixLength(0).maxExpansions(100));
Update 1
I have tried with the below query
QueryBuilder qb = QueryBuilders.matchQuery(QueryBuilders.fuzzyQuery("values", keyword).boost(1.0f).prefixLength(0).maxExpansions(100));
But am not able to pass QueryBuilders inside matchQuery. Am getting this suggestion while am writing this query The method matchQuery(String, Object) in the type QueryBuilders is not applicable for the arguments (FuzzyQueryBuilder)

The mentioned java query is not a match query. It's a must query. you should use matchQuery instead of boolQuery().must(QueryBuilders.fuzzyQuery())
Update 1:
fuzzy query is a term query while match query is a full text query.
Also don't forget that in match query the default Operator is or operator which you should change it to and like your dsl query.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Union searches through elasticsearch and spring - java

Related

how to search case condition in elastic search

Scala - Ids lists of objects with duplicated values from spark dataset

ElasticSearch range query in a paragraph

Mongo query with multiple filters on same attribute doesn't work with JAVA

ElasticSearch - fuzzy search java api results are not proper

Categories

Resources