I'm trying to combine mutiple queries in elasticsearch using a boolean query but the result is not what I'm expecting. For example:
If I have the following documents (among others):
DOC 1:
{
"name":"Iphone 5",
"product_suggestions":{
"input":[
"iphone 5",
"apple"
]
},
"description":"Iphone 5 - The almost last version",
"brand":"Apple",
"brand_facet":"Apple",
"state_id":"2",
"user_state_description":"Almost New",
"product_type_id":"1",
"current_price":350,
"finish_date":"2014/06/20 14:12",
"finish_date_ms":1403273520
}
DOC 2:
{
"name":"Apple II Lisa",
"product_suggestions":{
"input":[
"apple ii lisa",
"apple"
]
},
"description":"Make a offer and I Apple II Lisa!!",
"brand":"Apple",
"brand_facet":"Apple",
"state_id":"2",
"user_state_description":"Used",
"product_type_id":"1",
"current_price":150,
"finish_date":"2014/06/15 16:12",
"finish_date_ms":1402848720
}
DOC 3:
{
"name":"Iphone 5s",
"product_suggestions":{
"input":[
"iphone 5s",
"apple"
]
},
"description":"Iphone 5s 32Gb like new with a few scratches bla bla bla",
"brand":"Apple",
"brand_facet":"Apple",
"state_id":"1",
"user_state_description":"New",
"product_type_id":"2",
"current_price":510.1,
"finish_date":"2014/06/10 14:12",
"finish_date_ms":1402409520
}
DOC 4:
{
"name":"Iphone 4s",
"product_suggestions":{
"input":[
"iphone 4s",
"apple"
]
},
"description":"Iphone 4s 16Gb Mint conditions and unlocked to all network",
"brand":"Apple",
"brand_facet":"Apple",
"state_id":"1",
"user_state_description":"Almost New",
"product_type_id":"2",
"current_price":385,
"finish_date":"2014/06/12 16:12",
"finish_date_ms":1402589520
}
And if I run the following query (Get all documents and facets with the keyword "Apple" that the finish_date_ms is bigger than 1402869581)
{
"from" : 1,
"size" : 20,
"query" : {
"bool" : {
"must" : {
"query_string" : {
"query" : "apple",
"default_operator" : "and",
"analyze_wildcard" : true
}
},
"must_not" : {
"range" : {
"finish_date_ms" : {
"from" : null,
"to" : 1402869581,
"include_lower" : true,
"include_upper" : false
}
}
}
}
},
"facets" : {
"brand" : {
"terms" : {
"field" : "brand_facet",
"size" : 10
}
},
"product_type_id" : {
"terms" : {
"field" : "product_type_id",
"size" : 10
}
},
"state_id" : {
"terms" : {
"field" : "state_id",
"size" : 10
}
}
}
}
This returns:
{
"took":5,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.18392482,
"hits":[
]
},
"facets":{
"brand":{
"_type":"terms",
"missing":0,
"total":1,
"other":0,
"terms":[
{
"term":"Apple",
"count":1
}
]
},
"product_type_id":{
"_type":"terms",
"missing":0,
"total":1,
"other":0,
"terms":[
{
"term":1,
"count":1
}
]
},
"state_id":{
"_type":"terms",
"missing":0,
"total":1,
"other":0,
"terms":[
{
"term":2,
"count":1
}
]
}
}
}
And should return only the document DOC1. If I remove the range query, returns all the documents that has Apple word. If I remve the "term" query then n document is returns, so I presume the problem is in the range query.
Can anyone point me in the right direction with this?
One other important thing, all this query is to be implemented in java (if this help).
Thanks!
(sory for this huge post)
I found my mistake. (newbie mistake to be honest)
The problem was not in the range query but in the begging of the Json: The from field is set to 1 but the result is only one record so this should be 0!!
Thanks for everything!!
Related
I'm new to elastic search.
So this is how the index looks:
{
"scresults-000001" : {
"aliases" : {
"scresults" : { }
},
"mappings" : {
"properties" : {
"callType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"data" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"esdtValues" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"gasLimit" : {
"type" : "long"
},
AND MORE OTHER Fields.......
If I'm trying to create a search query in Java that looks like this:
{
"bool" : {
"filter" : [
{
"term" : {
"sender" : {
"value" : "sendervalue",
"boost" : 1.0
}
}
},
{
"term" : {
"data" : {
"value" : "YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU=",
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
If I run this query I get 0 hits. If I change the field "data" with other field it works. I don't understand what's different.
How I actually create the query in Java+SpringBoot:
QueryBuilder boolQuery = QueryBuilders.boolQuery()
.filter(QueryBuilders.termQuery("sender", "sendervalue"))
.filter(QueryBuilders.termQuery("data",
"YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU="));
Query searchQuery = new NativeSearchQueryBuilder()
.withFilter(boolQuery)
.build();
SearchHits<ScResults> articles = elasticsearchTemplate.search(searchQuery, ScResults.class);
Since you're trying to do an exact match on a string with a term query, you need to do it on the data.keyword field which is not analyzed. Since the data field is a text field, hence analyzed by the standard analyzer, not only are all letters lowercased but the = sign at the end also gets stripped off, so there's no way this can match (unless you use a match query on the data field but then you'd not do exact matching anymore).
POST _analyze
{
"analyzer": "standard",
"text": "YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU="
}
Results:
{
"tokens" : [
{
"token" : "ywrktglxdwlkaxr5uhjvehlamdawmdawmdawmdawmdawmda1mdblymqzmdrjmmyznge2yjnmnme1n2mxmznhyjdiogm2zjgxzgm0mde1ntq4m0a3zje1yjewoddmmjuwnzq4qdbjmdu0yjcwndhlmmy5nte1zwe3ywu",
"start_offset" : 0,
"end_offset" : 163,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
I'm using mongo 4.2.15
Here is entry:
{
"keys": {
"country": "US",
"channel": "c999"
},
"counters": {
"sale": 0
},
"increments": null
}
I want to be able to initialize counter set as well as increment counters.sale value and save increment result snapshot to increments property. Something like that:
db.getCollection('counterSets').update(
{ "$and" : [
{ "keys.country" : "US"},
{ "keys.channel" : "c999"}
]
},
{ "$inc" :
{ "counters.sale" : 10
},
"$set" :
{ "keys" :
{ "country" : "US", "channel" : "c999"},
"increments":
{ "3000c058-b8a7-4cff-915b-4979ef9a6ed9": {"counters" : "$counters"} }
}
},
{upsert: true})
The result is:
{
"_id" : ObjectId("61965aba1501d6eb40588ba0"),
"keys" : {
"country" : "US",
"channel" : "c999"
},
"counters" : {
"sale" : 10.0
},
"increments" : {
"3000c058-b8a7-4cff-915b-4979ef9a6ed9" : {
"counters" : "$counters"
}
}
}
Does it possible to do such update which is some how copy increment result from root object counters to child increments.3000c058-b8a7-4cff-915b-4979ef9a6ed9.counters with a single upsert. I want to implement safe inrement. Maybe you can suggest some another design?
In order to use expressions, your $set should be part of aggregation pipeline. So your query should look like
NOTE: I've added square brackets to the update
db.getCollection('counterSets').update(
{ "$and" : [
{ "keys.country" : "US"},
{ "keys.channel" : "c999"}
]
},
[ {"$set": {"counters.sale": {"$sum":["$counters.sale", 10]}}}, {"$set": {"increments.x": "$counters"}}],
{upsert: true})
I haven't found any information about the atomicity of aggregation pipelines, so use this carefully.
I have a field called Description which is a text field and has data like:
This is a good thing for versions before 3.2 but bad for 3.5 and later
I want to run range query on this type of text. I know that for a field containing only Dates/Age(Numbers) or even String Ids, we can use queries like
{
"query": {
"range" : {
"age" : {
"gte" : 10,
"lte" : 20,
"boost" : 2.0
}
}
}
}
But i have a mixed field like mentioned above and I need to perform range query on that. Also, i cannot change the index structure. I can only perform queries or do some post processing after retrieving results. So anyone has any idea how to run this type of query, or even obtain my goal after getting results in the post processing? I am using Java.
I hope i fully understand what you are looking for.
I've managed to create a simple working example.
Mappings
Using char_group tokenizer:
The char_group tokenizer breaks text into terms whenever it encounters a character which is in a defined set. It is mostly useful for cases where a simple custom tokenization is desired, and the overhead of use of the pattern tokenizer is not acceptable.
Char Group Tokenizer
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "char_group",
"tokenize_on_chars": [
"letter",
"whitespace"
]
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"fields": {
"digit": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
}
Post a few documents
PUT my_index/_doc/1
{
"text": "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
PUT my_index/_doc/2
{
"text": "This is a good thing for versions before 5 but bad for 6 and later"
}
Search Query
GET my_index/_search
{
"query": {
"range": {
"text.digit": {
"gte": 3.2,
"lte": 3.5
}
}
}
}
Results
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"text" : "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
}
]
}
Another Search Query
GET my_index/_search
{
"query": {
"range": {
"text.digit": {
"gt": 3.5
}
}
}
}
Results
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"text" : "This is a good thing for versions before 5 but bad for 6 and later"
}
}
]
}
Analyze Query
Play with the following query till you get the desired results.
It is already compatible to your example.
This is a good thing for versions before 3.2 but bad for 3.5 and later
POST _analyze
{
"tokenizer": {
"type": "char_group",
"tokenize_on_chars": [
"letter",
"whitespace"
]
},
"text": "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
Hope this helps
I have a document in a MongoDB, which looks like follows.
{
"_id" : ObjectId("5ceb812b3ec6d22cb94c82ca"),
"key" : "KEYCODE001",
"values" : [
{
"classId" : "CLASS_01",
"objects" : [
{
"code" : "DD0001"
},
{
"code" : "DD0010"
}
]
},
{
"classId" : "CLASS_02",
"objects" : [
{
"code" : "AD0001"
}
]
}
]
}
I am interested in getting a result like follows.
{
"classId" : "CLASS_01",
"objects" : [
{
"code" : "DD0001"
},
{
"code" : "DD0010"
}
]
}
To get this, I came up with an aggregation pipeline in Robo 3T, which looks like follows. And it's working as expected.
[
{
$match:{
'key':'KEYCODE001'
}
},
{
"$unwind":{
"path": "$values",
"preserveNullAndEmptyArrays": true
}
},
{
"$unwind":{
"path": "$values.objects",
"preserveNullAndEmptyArrays": true
}
},
{
$match:{
'values.classId':'CLASS_01'
}
},
{
$project:{
'object':'$values.objects',
'classId':'$values.classId'
}
},
{
$group:{
'_id':'$classId',
'objects':{
$push:'$object'
}
}
},
{
$project:{
'_id':0,
'classId':'$_id',
'objects':'$$objects'
}
}
]
Now, when I try to do the same in a SpringBoot application, I can't get it running. I ended up having the error java.lang.IllegalArgumentException: Invalid reference '$complication'!. Following is what I have done in Java so far.
final Aggregation aggregation = newAggregation(
match(Criteria.where("key").is("KEYCODE001")),
unwind("$values", true),
unwind("$values.objects", true),
match(Criteria.where("classId").is("CLASS_01")),
project().and("$values.classId").as("classId").and("$values.objects").as("object"),
group("classId", "objects").push("$object").as("objects").first("$classId").as("_id"),
project().and("$_id").as("classId").and("$objects").as("objects")
);
What am I doing wrong? Upon research, I found that multiple fields in group does not work or something like that (please refer to this question). So, is what I am currently doing even possible in Spring Boot?
After hours of debugging + trial and error, found the following solution to be working.
final Aggregation aggregation = newAggregation(
match(Criteria.where("key").is("KEYCODE001")),
unwind("values", true),
unwind("values.objects", true),
match(Criteria.where("values.classId").is("CLASS_01")),
project().and("values.classId").as("classId").and("values.objects").as("object"),
group(Fields.from(Fields.field("_id", "classId"))).push("object").as("objects"),
project().and("_id").as("classId").and("objects").as("objects")
);
It all boils down to group(Fields.from(Fields.field("_id", "classId"))).push("object").as("objects") that which introduces a org.springframework.data.mongodb.core.aggregation.Fields object that wraps a list of org.springframework.data.mongodb.core.aggregation.Field objects. Within Field, the name of the field and the target could be encapsulated. This resulted in the following pipeline which is a match for the expected.
[
{
"$match" :{
"key" : "KEYCODE001"
}
},
{
"$unwind" :{
"path" : "$values", "preserveNullAndEmptyArrays" : true
}
},
{
"$unwind" :{
"path" : "$values.objects", "preserveNullAndEmptyArrays" : true
}
},
{
"$match" :{
"values.classId" : "CLASS_01"
}
},
{
"$project" :{
"classId" : "$values.classId", "object" : "$values.objects"
}
},
{
"$group" :{
"_id" : "$classId",
"objects" :{
"$push" : "$object"
}
}
},
{
"$project" :{
"classId" : "$_id", "objects" : 1
}
}
]
Additionally, figured that there is no need to using $ sign anywhere and everywhere.
I am facing very awkward results when I try to query a highlight
search to Elastic Search 2.4. I query for a term which contains hyphen('-') for eg Drug-related and i get a result set, the response are generally all correct, but some result's fragments are like
Result:
1) other costs of drug abuse include drug-related deaths, cocaine, crack, and methamphetamines.
2) Drug-related crimes are, drug-related wealth, Escobar financed a private army to conduct,
3) in was inadvertently driven into the middle of a Drug
4) -related shootout between traffickers at the international airport
problem : the text (Drug-Related) has broken into 2 fragments (3 & 4).
expected : drug-related should come as single line.
Here is my query
{"size" : 5000,
"query" : {
"bool" : {
"must" : [ {
"match" : {
"bookId" : {
"query" : "<SomeID>",
"type" : "boolean",
"operator" : "AND"
}
}
}, {
"match" : {
"contentType" : {
"query" : "booktext",
"type" : "boolean",
"operator" : "AND"
}
}
}, {
"bool" : {
"must" : {
"match" : {
"content" : {
"query" : "drug-related*",
"type" : "phrase_prefix"
}
}
}
}
} ]
}
},
"highlight" : {
"pre_tags" : [ "" ],
"post_tags" : [ "" ],
"fragment_size" : 60,
"number_of_fragments" : 5000,
"boundary_max_scan" : 10,
"highlight_query" : {
"bool" : {
"must" : [ {
"match" : {
"bookId" : {
"query" : "<BookID>",
"type" : "boolean",
"operator" : "AND"
}
}
}, {
"match" : {
"contentType" : {
"query" : "booktext",
"type" : "boolean",
"operator" : "AND"
}
}
}, {
"bool" : {
"must" : {
"match" : {
"content" : {
"query" : "drug-related*",
"type" : "phrase_prefix"
}
}
}
}
} ]
}
},
"fields" : {
"content" : { }
}
}
}
Query Response : ** I cannot paste full content here but I am adding response hits
{content=[content], fragments[[ accurate figures available regarding the cost of drug-related prostitution, mentioned, other costs of drug abuse include drug-related deaths, cocaine, crack, and methamphetamines.
Drug-related crimes are, drug-related wealth, Escobar financed a private army to conduct, believed to be drug related. According to a news story on this, ,” the President wrote. “We have him.” (Bio. 2016)
Drug-related violence, in was inadvertently driven into the middle of a Drug, -related shootout between traffickers at the international airport]]}
I just don't see a pattern here. Can anyone please explain the reason for that ?