highlighted query has hits but misses highlighted fragments (for some documents) - java

I am facing very awkward results when I try to query a highlight
search to Elastic Search 2.4. I query for a term which contains hyphen('-') for eg Drug-related and i get a result set, the response are generally all correct, but some result's fragments are like
Result:
1) other costs of drug abuse include drug-related deaths, cocaine, crack, and methamphetamines.
2) Drug-related crimes are, drug-related wealth, Escobar financed a private army to conduct,
3) in was inadvertently driven into the middle of a Drug
4) -related shootout between traffickers at the international airport
problem : the text (Drug-Related) has broken into 2 fragments (3 & 4).
expected : drug-related should come as single line.
Here is my query
{"size" : 5000,
"query" : {
"bool" : {
"must" : [ {
"match" : {
"bookId" : {
"query" : "<SomeID>",
"type" : "boolean",
"operator" : "AND"
}
}
}, {
"match" : {
"contentType" : {
"query" : "booktext",
"type" : "boolean",
"operator" : "AND"
}
}
}, {
"bool" : {
"must" : {
"match" : {
"content" : {
"query" : "drug-related*",
"type" : "phrase_prefix"
}
}
}
}
} ]
}
},
"highlight" : {
"pre_tags" : [ "" ],
"post_tags" : [ "" ],
"fragment_size" : 60,
"number_of_fragments" : 5000,
"boundary_max_scan" : 10,
"highlight_query" : {
"bool" : {
"must" : [ {
"match" : {
"bookId" : {
"query" : "<BookID>",
"type" : "boolean",
"operator" : "AND"
}
}
}, {
"match" : {
"contentType" : {
"query" : "booktext",
"type" : "boolean",
"operator" : "AND"
}
}
}, {
"bool" : {
"must" : {
"match" : {
"content" : {
"query" : "drug-related*",
"type" : "phrase_prefix"
}
}
}
}
} ]
}
},
"fields" : {
"content" : { }
}
}
}
Query Response : ** I cannot paste full content here but I am adding response hits
{content=[content], fragments[[ accurate figures available regarding the cost of drug-related prostitution, mentioned, other costs of drug abuse include drug-related deaths, cocaine, crack, and methamphetamines.
Drug-related crimes are, drug-related wealth, Escobar financed a private army to conduct, believed to be drug related. According to a news story on this, ,” the President wrote. “We have him.” (Bio. 2016)
Drug-related violence, in was inadvertently driven into the middle of a Drug, -related shootout between traffickers at the international airport]]}
I just don't see a pattern here. Can anyone please explain the reason for that ?

Related

Elasticsearch is not working with Alphanumeric

I am having alphanumeric codes like AA111, 111AA, AA-111, AAAA, 1111. Below is the mapping for elastic search
"name" : {
"type" : "text",
"analyzer" : "standard",
"fields" : {
"lower_case_sort" : {
"type" : "keyword",
"normalizer" : "lowercase"
}
},
"copy_to" : "default"
}
The default mapping is like below
"default" : {
"type" : "text",
"analyzer" : "index_ngram",
"search_analyzer" : "search_ngram"
},
When we search with AAA or AA, It returns results. But when we search by 111 it does not return any result.
Below is the query
"bool" : {
"filter" : [
{
"match" : {
"default" : {
"query" : "111",
"operator" : "AND",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
Its happening as you are using the some analyzer on your default field, which is removing the numbers from your text (simple analyzer is one of them), you need to use a tokeniser which doesn't remove them like edge-ngram and search on them, or use the standard analyzer which also works with the numbers.

Elastic search term query not working on a specific field

I'm new to elastic search.
So this is how the index looks:
{
"scresults-000001" : {
"aliases" : {
"scresults" : { }
},
"mappings" : {
"properties" : {
"callType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"data" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"esdtValues" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"gasLimit" : {
"type" : "long"
},
AND MORE OTHER Fields.......
If I'm trying to create a search query in Java that looks like this:
{
"bool" : {
"filter" : [
{
"term" : {
"sender" : {
"value" : "sendervalue",
"boost" : 1.0
}
}
},
{
"term" : {
"data" : {
"value" : "YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU=",
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
If I run this query I get 0 hits. If I change the field "data" with other field it works. I don't understand what's different.
How I actually create the query in Java+SpringBoot:
QueryBuilder boolQuery = QueryBuilders.boolQuery()
.filter(QueryBuilders.termQuery("sender", "sendervalue"))
.filter(QueryBuilders.termQuery("data",
"YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU="));
Query searchQuery = new NativeSearchQueryBuilder()
.withFilter(boolQuery)
.build();
SearchHits<ScResults> articles = elasticsearchTemplate.search(searchQuery, ScResults.class);
Since you're trying to do an exact match on a string with a term query, you need to do it on the data.keyword field which is not analyzed. Since the data field is a text field, hence analyzed by the standard analyzer, not only are all letters lowercased but the = sign at the end also gets stripped off, so there's no way this can match (unless you use a match query on the data field but then you'd not do exact matching anymore).
POST _analyze
{
"analyzer": "standard",
"text": "YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU="
}
Results:
{
"tokens" : [
{
"token" : "ywrktglxdwlkaxr5uhjvehlamdawmdawmdawmdawmdawmda1mdblymqzmdrjmmyznge2yjnmnme1n2mxmznhyjdiogm2zjgxzgm0mde1ntq4m0a3zje1yjewoddmmjuwnzq4qdbjmdu0yjcwndhlmmy5nte1zwe3ywu",
"start_offset" : 0,
"end_offset" : 163,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}

MongoTemplate upsert property snapshot

I'm using mongo 4.2.15
Here is entry:
{
"keys": {
"country": "US",
"channel": "c999"
},
"counters": {
"sale": 0
},
"increments": null
}
I want to be able to initialize counter set as well as increment counters.sale value and save increment result snapshot to increments property. Something like that:
db.getCollection('counterSets').update(
{ "$and" : [
{ "keys.country" : "US"},
{ "keys.channel" : "c999"}
]
},
{ "$inc" :
{ "counters.sale" : 10
},
"$set" :
{ "keys" :
{ "country" : "US", "channel" : "c999"},
"increments":
{ "3000c058-b8a7-4cff-915b-4979ef9a6ed9": {"counters" : "$counters"} }
}
},
{upsert: true})
The result is:
{
"_id" : ObjectId("61965aba1501d6eb40588ba0"),
"keys" : {
"country" : "US",
"channel" : "c999"
},
"counters" : {
"sale" : 10.0
},
"increments" : {
"3000c058-b8a7-4cff-915b-4979ef9a6ed9" : {
"counters" : "$counters"
}
}
}
Does it possible to do such update which is some how copy increment result from root object counters to child increments.3000c058-b8a7-4cff-915b-4979ef9a6ed9.counters with a single upsert. I want to implement safe inrement. Maybe you can suggest some another design?
In order to use expressions, your $set should be part of aggregation pipeline. So your query should look like
NOTE: I've added square brackets to the update
db.getCollection('counterSets').update(
{ "$and" : [
{ "keys.country" : "US"},
{ "keys.channel" : "c999"}
]
},
[ {"$set": {"counters.sale": {"$sum":["$counters.sale", 10]}}}, {"$set": {"increments.x": "$counters"}}],
{upsert: true})
I haven't found any information about the atomicity of aggregation pipelines, so use this carefully.

How can I retrieve the value from a column instead of the link using Spring Data JPA?

I'm using a projection to retrieve a list of matches with the teams inline.
Projection:
#Projection(name = "matchInlineTeams", types = { Match.class })
public interface MatchInlineTeams {
Team getHomeTeam();
Long getHomeTeamGoals();
Long getAwayTeamGoals();
Team getAwayTeam();
}
And my result is a collection of these:
{
"homeTeam" : {
"teamName" : "Banfield",
"teamFoundation" : "1896-01-21T03:00:00.000+0000",
"teamCity" : 73,
"teamCountry" : "ARG",
"handler" : { },
"hibernateLazyInitializer" : { }
},
"homeTeamGoals" : 2,
"awayTeamGoals" : 0,
"awayTeam" : {
"teamName" : "Gimnasia (LP)",
"teamFoundation" : "1887-06-03T03:00:00.000+0000",
"teamCity" : 76,
"teamCountry" : "ARG",
"handler" : { },
"hibernateLazyInitializer" : { }
},
"_links" : {
"self" : {
"href" : "http://localhost:8080/matches/1"
},
"match" : {
"href" : "http://localhost:8080/matches/1{?projection}",
"templated" : true
},
"goals" : {
"href" : "http://localhost:8080/matches/1/goals"
},
"homeTeam" : {
"href" : "http://localhost:8080/matches/1/homeTeam"
},
"competition" : {
"href" : "http://localhost:8080/matches/1/competition"
},
"matchStadium" : {
"href" : "http://localhost:8080/matches/1/matchStadium"
},
"awayTeam" : {
"href" : "http://localhost:8080/matches/1/awayTeam"
}
}
}
I need to do many calculations for a stats app and I have the logic in the front end so to build a match history between two teams, I need to make this request and it is taking about a second to retrieve everything, which is fine.
My problem now is that I want to build a table out of history matches, therefore I can't request the matches between 2 teams, I have to request all matches where a team participated.
Anyway, now I can't use that because instead of 200 matches, I get 3500 as a response, so it takes around 20 seconds to build the response.
I'm guessing that is because the API is returning all links and resolving both teams for each object which is fine but I don't need it so. Is there a way for me to create a projection (or any other class) that will return the literal version of my column instead of resolving the object reference?
I want my result to be like this:
{
"homeTeam" : 10,
"homeTeamGoals" : 2,
"awayTeamGoals" : 0,
"awayTeam" : 36,
"_links" : {
"self" : {
"href" : "http://localhost:8080/matches/1"
},
"match" : {
"href" : "http://localhost:8080/matches/1{?projection}",
"templated" : true
}
}
When my table is built, I will call the teams endpoint to resolve the team's names.
So considering this, what I really need is to make it faster (like 20 times faster). So if this is not the right path, I would very much appreciate a suggestion.

Elasticsearch combining queries with Boolean query

I'm trying to combine mutiple queries in elasticsearch using a boolean query but the result is not what I'm expecting. For example:
If I have the following documents (among others):
DOC 1:
{
"name":"Iphone 5",
"product_suggestions":{
"input":[
"iphone 5",
"apple"
]
},
"description":"Iphone 5 - The almost last version",
"brand":"Apple",
"brand_facet":"Apple",
"state_id":"2",
"user_state_description":"Almost New",
"product_type_id":"1",
"current_price":350,
"finish_date":"2014/06/20 14:12",
"finish_date_ms":1403273520
}
DOC 2:
{
"name":"Apple II Lisa",
"product_suggestions":{
"input":[
"apple ii lisa",
"apple"
]
},
"description":"Make a offer and I Apple II Lisa!!",
"brand":"Apple",
"brand_facet":"Apple",
"state_id":"2",
"user_state_description":"Used",
"product_type_id":"1",
"current_price":150,
"finish_date":"2014/06/15 16:12",
"finish_date_ms":1402848720
}
DOC 3:
{
"name":"Iphone 5s",
"product_suggestions":{
"input":[
"iphone 5s",
"apple"
]
},
"description":"Iphone 5s 32Gb like new with a few scratches bla bla bla",
"brand":"Apple",
"brand_facet":"Apple",
"state_id":"1",
"user_state_description":"New",
"product_type_id":"2",
"current_price":510.1,
"finish_date":"2014/06/10 14:12",
"finish_date_ms":1402409520
}
DOC 4:
{
"name":"Iphone 4s",
"product_suggestions":{
"input":[
"iphone 4s",
"apple"
]
},
"description":"Iphone 4s 16Gb Mint conditions and unlocked to all network",
"brand":"Apple",
"brand_facet":"Apple",
"state_id":"1",
"user_state_description":"Almost New",
"product_type_id":"2",
"current_price":385,
"finish_date":"2014/06/12 16:12",
"finish_date_ms":1402589520
}
And if I run the following query (Get all documents and facets with the keyword "Apple" that the finish_date_ms is bigger than 1402869581)
{
"from" : 1,
"size" : 20,
"query" : {
"bool" : {
"must" : {
"query_string" : {
"query" : "apple",
"default_operator" : "and",
"analyze_wildcard" : true
}
},
"must_not" : {
"range" : {
"finish_date_ms" : {
"from" : null,
"to" : 1402869581,
"include_lower" : true,
"include_upper" : false
}
}
}
}
},
"facets" : {
"brand" : {
"terms" : {
"field" : "brand_facet",
"size" : 10
}
},
"product_type_id" : {
"terms" : {
"field" : "product_type_id",
"size" : 10
}
},
"state_id" : {
"terms" : {
"field" : "state_id",
"size" : 10
}
}
}
}
This returns:
{
"took":5,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.18392482,
"hits":[
]
},
"facets":{
"brand":{
"_type":"terms",
"missing":0,
"total":1,
"other":0,
"terms":[
{
"term":"Apple",
"count":1
}
]
},
"product_type_id":{
"_type":"terms",
"missing":0,
"total":1,
"other":0,
"terms":[
{
"term":1,
"count":1
}
]
},
"state_id":{
"_type":"terms",
"missing":0,
"total":1,
"other":0,
"terms":[
{
"term":2,
"count":1
}
]
}
}
}
And should return only the document DOC1. If I remove the range query, returns all the documents that has Apple word. If I remve the "term" query then n document is returns, so I presume the problem is in the range query.
Can anyone point me in the right direction with this?
One other important thing, all this query is to be implemented in java (if this help).
Thanks!
(sory for this huge post)
I found my mistake. (newbie mistake to be honest)
The problem was not in the range query but in the begging of the Json: The from field is set to 1 but the result is only one record so this should be 0!!
Thanks for everything!!

Categories