Elastic search match_pharse query is not working properly - java

Am trying to search the below document using match_phrase query in kibana but am not getting the response.
Please find the document below which is availabe in elastic search
{
"took":7,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"skipped":0,
"failed":0
},
"hits":{
"total":2910,
"max_score":1.0,
"hits":[
{
"_index":"documents",
"_type":"doc",
"_id":"DmLD22MBFTg0XFZppYt8",
"_score":1.0,
"_source":{
"doct_country":"DE",
"filename":"series_Accessories_v1_de-DE.pdf",
}
]
}
}
Please find the query which am using to search this above document.
GET documents/_search
{
"query": {
"match_phrase" : {
"message" : "Accessories_v1_de-DE.pdf"
}
}
}
For the above query am getting this response :
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

There are two issues. Presumably in your query you mean to use the filename field rather than message which is not present in you example document:
GET documents/_search
{
"query": {
"match_phrase" : {
"filename" : "Accessories_v1_de-DE.pdf"
}
}
}
Second, you need Elasticsearch to know that the filename field should be indexed with _ treated as a split. This does not happen by default. One way to do this is to define your mapping as follows:
PUT /documents
{
"mappings" : {
"document" : {
"properties" : {
"filename" : { "type" : "text", "analyzer": "simple" }
}
}
}
}
The simple analyzer will split on any non-letter, so _ and numbers will be treated as splits. Depending on your application, you may need finer grained control over tokenization. See the documentation.

Related

ElasticSearch range query in a paragraph

I have a field called Description which is a text field and has data like:
This is a good thing for versions before 3.2 but bad for 3.5 and later
I want to run range query on this type of text. I know that for a field containing only Dates/Age(Numbers) or even String Ids, we can use queries like
{
"query": {
"range" : {
"age" : {
"gte" : 10,
"lte" : 20,
"boost" : 2.0
}
}
}
}
But i have a mixed field like mentioned above and I need to perform range query on that. Also, i cannot change the index structure. I can only perform queries or do some post processing after retrieving results. So anyone has any idea how to run this type of query, or even obtain my goal after getting results in the post processing? I am using Java.
I hope i fully understand what you are looking for.
I've managed to create a simple working example.
Mappings
Using char_group tokenizer:
The char_group tokenizer breaks text into terms whenever it encounters a character which is in a defined set. It is mostly useful for cases where a simple custom tokenization is desired, and the overhead of use of the pattern tokenizer is not acceptable.
Char Group Tokenizer
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "char_group",
"tokenize_on_chars": [
"letter",
"whitespace"
]
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"fields": {
"digit": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
}
Post a few documents
PUT my_index/_doc/1
{
"text": "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
PUT my_index/_doc/2
{
"text": "This is a good thing for versions before 5 but bad for 6 and later"
}
Search Query
GET my_index/_search
{
"query": {
"range": {
"text.digit": {
"gte": 3.2,
"lte": 3.5
}
}
}
}
Results
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"text" : "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
}
]
}
Another Search Query
GET my_index/_search
{
"query": {
"range": {
"text.digit": {
"gt": 3.5
}
}
}
}
Results
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"text" : "This is a good thing for versions before 5 but bad for 6 and later"
}
}
]
}
Analyze Query
Play with the following query till you get the desired results.
It is already compatible to your example.
This is a good thing for versions before 3.2 but bad for 3.5 and later
POST _analyze
{
"tokenizer": {
"type": "char_group",
"tokenize_on_chars": [
"letter",
"whitespace"
]
},
"text": "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
Hope this helps

ElasticSearch - fuzzy search java api results are not proper

I have indexed sample documents in elasticsearch and trying to search using fuzzy query. But am not getting any results when am search by using Java fuzzy query api.
Please find my below mapping script :
PUT productcatalog
{
"settings": {
"analysis": {
"analyzer": {
"attr_analyzer": {
"type": "custom",
"tokenizer": "letter",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding", "stemmer_minimal_english"]
}
},
"filter" : {
"stemmer_minimal_english" : {
"type" : "stemmer",
"name" : "minimal_english"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"values": {
"type": "text",
"analyzer": "attr_analyzer"
},
"catalog_type": {
"type": "text"
},
"catalog_id":{
"type": "long"
}
}
}
}
}
Please find my sample data.
PUT productcatalog/doc/1
{
"catalog_id" : "343",
"catalog_type" : "series",
"values" : "Activa Rooftop, valves, VG3000, VG3000FS, butterfly, ball"
}
PUT productcatalog/doc/2
{
"catalog_id" : "12717",
"catalog_type" : "product",
"values" : "Activa Rooftop, valves"
}
Please find my search script :
GET productcatalog/_search
{
"query": {
"match" : {
"values" : {
"query" : " activa rooftop VG3000",
"operator" : "and",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
}
}
Am getting the below results for the above query :
{
"took": 239,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.970927,
"hits": [
{
"_index": "productcatalog",
"_type": "doc",
"_id": "1",
"_score": 0.970927,
"_source": {
"catalog_id": "343",
"catalog_type": "series",
"values": "Activa Rooftop, valves, VG3000, VG3000FS, butterfly, ball"
}
}
]
}
}
But if i use the below Java API for the same fuzzy search am not getting any results out of it.
Please find my below Java API query for fuzzy search :
QueryBuilder qb = QueryBuilders.boolQuery()
.must(QueryBuilders.fuzzyQuery("values", keyword).boost(1.0f).prefixLength(0).maxExpansions(100));
Update 1
I have tried with the below query
QueryBuilder qb = QueryBuilders.matchQuery(QueryBuilders.fuzzyQuery("values", keyword).boost(1.0f).prefixLength(0).maxExpansions(100));
But am not able to pass QueryBuilders inside matchQuery. Am getting this suggestion while am writing this query The method matchQuery(String, Object) in the type QueryBuilders is not applicable for the arguments (FuzzyQueryBuilder)
The mentioned java query is not a match query. It's a must query. you should use matchQuery instead of boolQuery().must(QueryBuilders.fuzzyQuery())
Update 1:
fuzzy query is a term query while match query is a full text query.
Also don't forget that in match query the default Operator is or operator which you should change it to and like your dsl query.

Completion Suggester in elasticsearch in mutifield

I'm using elasticsearch for the first time. I'm trying to use completion suggester in multi-field key, although I don't see any error but I don't get the response.
Mapping creation:
PUT /products5/
{
"mappings":{
"products" : {
"properties" : {
"name" : {
"type":"text",
"fields":{
"text":{
"type":"keyword"
},
"suggest":{
"type" : "completion"
}
}
}
}
}
}
}
Indexing:
PUT /products5/product/1
{
"name": "Apple iphone 5"
}
PUT /products5/product/2
{
"name": "iphone 4 16GB"
}
PUT /products5/product/3
{
"name": "iphone 3 SS 16GB black"
}
PUT /products5/product/4
{
"name": "Apple iphone 4 S 16 GB white"
}
PUT /products5/product/5
{
"name": "Apple iphone case"
}
Query:
POST /products5/product/_search
{
"suggest":{
"my-suggestion":{
"prefix":"i",
"completion":{
"field":"name.suggest"
}
}
}
}
Output:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
},
"suggest": {
"my-suggestion": [
{
"text": "i",
"offset": 0,
"length": 1,
"options": []
}
]
}
}
Please guide me what is the mistake, I tried every possible options.
From the first perspective this looks accurate. Probably the reason why you don't have correct response is that you added documents in the index before you created mapping in the index. And documents are not indexed according to the mapping you specified
I have found an issue in your mapping name. There is an inconsistency between name of the mapping and value which you specifies in the url when you're creating new documents. You create a mapping in the index with the name products. And when you add new documents you're specifying product as a name of the mapping of your index and it doesn't end with s. You have a typo.

Elasticsearch lowercase tokenizer quirk?

I am testing mapping for url-s in elasticsearch.
I want to be able to search for entry both by domain name with tld (e.g. example.com)
and without tld (e.g example) and for full domain document to be returned
(like, http://example.com and www.example.com and similar)
I PUT this mapping to ES - in Sense:
PUT /en_docs
{
"mappings": {
"url": {
"properties": {
"content": {
"type": "string",
"analyzer" : "urlzer"
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"urlzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": [ "stopwords_filter" ]
}
},
"filter" : {
"stopwords_filter" : {
"type" : "stop",
"stopwords" : ["http", "https", "ftp", "www"]
}
}
}
}
}
Now, when I index an url document, e.g
POST /en_docs/url
{
"content": "http://example.com"
}
I can get it by searching example.com but just example doesnt return anything.
The lowercase tokenizer im using in my analyzer, as docs say, and as direct testing of my analyzer shows, gives example and com tokens, but when I do the search for indexed document, example returns nothing:
GET /en_docs/url/_search?q=example
gets no results, but if the query is example.com, result is returned.
What am I missing?

elasticsearch - Return the tokens of a field

How can I have the tokens of a particular field returned in the result
For example, A GET request
curl -XGET 'http://localhost:9200/twitter/tweet/1'
returns
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_source" : {
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}
}
I would like to have the tokens of '_source.message' field included in the result
There is also another way to do it using the following script_fields script:
curl -H 'Content-Type: application/json' -XPOST 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "message"
}
}
}
}'
It's important to note that while this script returns the actual terms that were indexed, it also caches all field values and on large indices can use a lot of memory. So, on large indices, it might be more useful to retrieve field values from stored fields or source and reparse them again on the fly using the following MVEL script:
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import java.io.StringReader;
// Cache analyzer for further use
cachedAnalyzer=(isdef cachedAnalyzer)?cachedAnalyzer:doc.mapperService().documentMapper(doc._type.value).mappers().indexAnalyzer();
terms=[];
// Get value from Fields Lookup
//val=_fields[field].values;
// Get value from Source Lookup
val=_source[field];
if(val != null) {
tokenStream=cachedAnalyzer.tokenStream(field, new StringReader(val));
CharTermAttribute termAttribute = tokenStream.addAttribute(CharTermAttribute);
while(tokenStream.incrementToken()) {
terms.add(termAttribute.toString())
};
tokenStream.close();
}
terms
This MVEL script can be stored as config/scripts/analyze.mvel and used with the following query:
curl 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "analyze",
"params": {
"field": "message"
}
}
}
}'
If you mean the tokens that have been indexed you can make a terms facet on the message field. Increase the size value in order to get more entries back, or set to 0 to get all terms.
Lucene provides the ability to store the term vectors, but there's no way to have access to it with elasticsearch by now (as far as I know).
Why do you need that? If you only want to check what you're indexing you can have a look at the analyze api.
Nowadays, it's possible with the Term vectors API:
curl http://localhost:9200/twitter/_termvectors/1?fields=message
Result:
{
"_index": "twitter",
"_id": "1",
"_version": 1,
"found": true,
"took": 0,
"term_vectors": {
"message": {
"field_statistics": {
"sum_doc_freq": 4,
"doc_count": 1,
"sum_ttf": 4
},
"terms": {
"elastic": {
"term_freq": 1,
"tokens": [
{
"position": 2,
"start_offset": 11,
"end_offset": 18
}
]
},
"out": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 7,
"end_offset": 10
}
]
},
"search": {
"term_freq": 1,
"tokens": [
{
"position": 3,
"start_offset": 19,
"end_offset": 25
}
]
},
"trying": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 6
}
]
}
}
}
}
}
Note: Mapping types (here: tweets) have been removed in Elasticsearch 8.x (see migration guide).

Categories