I'm using elasticsearch for the first time. I'm trying to use completion suggester in multi-field key, although I don't see any error but I don't get the response.
Mapping creation:
PUT /products5/
{
"mappings":{
"products" : {
"properties" : {
"name" : {
"type":"text",
"fields":{
"text":{
"type":"keyword"
},
"suggest":{
"type" : "completion"
}
}
}
}
}
}
}
Indexing:
PUT /products5/product/1
{
"name": "Apple iphone 5"
}
PUT /products5/product/2
{
"name": "iphone 4 16GB"
}
PUT /products5/product/3
{
"name": "iphone 3 SS 16GB black"
}
PUT /products5/product/4
{
"name": "Apple iphone 4 S 16 GB white"
}
PUT /products5/product/5
{
"name": "Apple iphone case"
}
Query:
POST /products5/product/_search
{
"suggest":{
"my-suggestion":{
"prefix":"i",
"completion":{
"field":"name.suggest"
}
}
}
}
Output:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
},
"suggest": {
"my-suggestion": [
{
"text": "i",
"offset": 0,
"length": 1,
"options": []
}
]
}
}
Please guide me what is the mistake, I tried every possible options.
From the first perspective this looks accurate. Probably the reason why you don't have correct response is that you added documents in the index before you created mapping in the index. And documents are not indexed according to the mapping you specified
I have found an issue in your mapping name. There is an inconsistency between name of the mapping and value which you specifies in the url when you're creating new documents. You create a mapping in the index with the name products. And when you add new documents you're specifying product as a name of the mapping of your index and it doesn't end with s. You have a typo.
Related
I have a field called Description which is a text field and has data like:
This is a good thing for versions before 3.2 but bad for 3.5 and later
I want to run range query on this type of text. I know that for a field containing only Dates/Age(Numbers) or even String Ids, we can use queries like
{
"query": {
"range" : {
"age" : {
"gte" : 10,
"lte" : 20,
"boost" : 2.0
}
}
}
}
But i have a mixed field like mentioned above and I need to perform range query on that. Also, i cannot change the index structure. I can only perform queries or do some post processing after retrieving results. So anyone has any idea how to run this type of query, or even obtain my goal after getting results in the post processing? I am using Java.
I hope i fully understand what you are looking for.
I've managed to create a simple working example.
Mappings
Using char_group tokenizer:
The char_group tokenizer breaks text into terms whenever it encounters a character which is in a defined set. It is mostly useful for cases where a simple custom tokenization is desired, and the overhead of use of the pattern tokenizer is not acceptable.
Char Group Tokenizer
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "char_group",
"tokenize_on_chars": [
"letter",
"whitespace"
]
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"fields": {
"digit": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
}
Post a few documents
PUT my_index/_doc/1
{
"text": "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
PUT my_index/_doc/2
{
"text": "This is a good thing for versions before 5 but bad for 6 and later"
}
Search Query
GET my_index/_search
{
"query": {
"range": {
"text.digit": {
"gte": 3.2,
"lte": 3.5
}
}
}
}
Results
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"text" : "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
}
]
}
Another Search Query
GET my_index/_search
{
"query": {
"range": {
"text.digit": {
"gt": 3.5
}
}
}
}
Results
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"text" : "This is a good thing for versions before 5 but bad for 6 and later"
}
}
]
}
Analyze Query
Play with the following query till you get the desired results.
It is already compatible to your example.
This is a good thing for versions before 3.2 but bad for 3.5 and later
POST _analyze
{
"tokenizer": {
"type": "char_group",
"tokenize_on_chars": [
"letter",
"whitespace"
]
},
"text": "This is a good thing for versions before 3.2 but bad for 3.5 and later"
}
Hope this helps
I'm trying to implement custom search in elastic search.
Problem statement is consider 3 documents inserted into elastic search with "names" field as array:
{
id:1,
names:["John Wick","Iron man"]
}
{
id:2,
names:["Wick Stone","Nick John"]
}
{
id:3,
names:["Manny Nick","Stone cold"]
}
when I search for "Nick" I want to boost or give priority to document starting with Nick so in this case document with id 2 should come first and then document with id 3 and also if I search for whole name "Manny Nick"
doc with id 3 should be given priority.
In such case, you may want to modify/boost the score of search matched result for required criteria. For example, match the documents with names "Nick" and at the same time modify and boost the score of documents which contains names that start with Nick so that documents that match Nick and also starts with Nick will have higher score.
One of the way to achieve this is using Function Score Query.
In the below query, search is made for keyword "Nick" and matched documents' score is modified and boosted for criteria "names that start with Nick" using Match Phrase Prefix Query with additional weight 20.
{
"query": {
"function_score": {
"query": {
"match": {
"names": "Nick"
}
},
"boost": "1",
"functions": [
{
"filter": {
"match_phrase_prefix": {
"names": "Nick"
}
},
"weight": 20
}
],
"boost_mode": "sum"
}
}
}
Testing:
Inserted data:
{
id:1,
names:["John Wick","Iron man"]
}
{
id:2,
names:["Wick Stone","Nick John"]
}
{
id:3,
names:["Manny Nick","Stone cold"]
}
Output:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 20.693148,
"hits": [
{
"_index": "stack_1",
"_type": "1",
"_id": "T9kn5WsBrk7qsVCmKBGH",
"_score": 20.693148,
"_source": {
"id": 2,
"names": [
"Wick Stone",
"Nick John"
]
}
},
{
"_index": "stack_1",
"_type": "1",
"_id": "Ttkm5WsBrk7qsVCm2RF_",
"_score": 20.287682,
"_source": {
"id": 3,
"names": [
"Manny Nick",
"Stone cold"
]
}
}
]
}
}
Am trying to search the below document using match_phrase query in kibana but am not getting the response.
Please find the document below which is availabe in elastic search
{
"took":7,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"skipped":0,
"failed":0
},
"hits":{
"total":2910,
"max_score":1.0,
"hits":[
{
"_index":"documents",
"_type":"doc",
"_id":"DmLD22MBFTg0XFZppYt8",
"_score":1.0,
"_source":{
"doct_country":"DE",
"filename":"series_Accessories_v1_de-DE.pdf",
}
]
}
}
Please find the query which am using to search this above document.
GET documents/_search
{
"query": {
"match_phrase" : {
"message" : "Accessories_v1_de-DE.pdf"
}
}
}
For the above query am getting this response :
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
There are two issues. Presumably in your query you mean to use the filename field rather than message which is not present in you example document:
GET documents/_search
{
"query": {
"match_phrase" : {
"filename" : "Accessories_v1_de-DE.pdf"
}
}
}
Second, you need Elasticsearch to know that the filename field should be indexed with _ treated as a split. This does not happen by default. One way to do this is to define your mapping as follows:
PUT /documents
{
"mappings" : {
"document" : {
"properties" : {
"filename" : { "type" : "text", "analyzer": "simple" }
}
}
}
}
The simple analyzer will split on any non-letter, so _ and numbers will be treated as splits. Depending on your application, you may need finer grained control over tokenization. See the documentation.
I've recently taken up Jayway JsonPath and I've had trouble with how the inpath filtering works.
So my JSON looks like this:
At the top I have shareables. These shareables have an array called user, which contains an ID and a name, and they also contain an item called dataset, which can contain any json.
These shareables can exist within the dataset as well.
My working JSON looks like this:
{
"shareable": {
"user": [
{
"ID": 1,
"Name": "Bob"
},
{
"ID": 2,
"Name": "Charles"
}
],
"dataSet": [
{
"insulinMeasurement":
{
"timestamp": "Tuesday Morning",
"measurement": 174,
"unit": "pmol/L"
}
},
{
"insulinMeasurement":
{
"timestamp": "Tuesday Noon",
"measurement": 80,
"unit": "pmol/L"
}
},
{ "shareable": {
"user": [
{
"ID": 3,
"Name": "Jim"
}
],
"dataSet": [
{
"insulinMeasurement":
{
"timestamp": "Tuesday Evening",
"measurement": 130,
"unit": "pmol/L"
}
}
]
}
},
{ "unshareable": {
"user": [
{
"ID": 2,
"Name": "Bob"
}
],
"dataSet": [
{
"insulinMeasurement":
{
"timestamp": "Tuesday Night",
"measurement": 130,
"unit": "pmol/L"
}
}
]
}
}
]
}
}
So what I want is, all shareables that have a user with a certain ID. So I figured the path I would use would look like this:
$..shareable[ ?(#.user[*].ID == 1 )]
which here has a hardcoded ID. This returns nothing while
$..shareable[ ?(#.user[0].ID == 1 )]
returns any shareable where the first ID is 1.
I also tried something along the lines of
$..shareable[ ?(#.user[?(#.ID == 1)]
which I figure should return any shareable that has a user with an ID of 1.
Am I going about this the wrong way? Do I need to somehow iterate through the user objects that exist?
Well, I figured it out, so if anyone stumbles across this, the query should look as follows:
$..shareable[?( " + user + " in #.user[*].ID )]
where user is just the int of the userId. Basically the right hand side creates a list of all IDs that shareable contains, and checks if the requested ID exists therein.
How can I have the tokens of a particular field returned in the result
For example, A GET request
curl -XGET 'http://localhost:9200/twitter/tweet/1'
returns
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_source" : {
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}
}
I would like to have the tokens of '_source.message' field included in the result
There is also another way to do it using the following script_fields script:
curl -H 'Content-Type: application/json' -XPOST 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "message"
}
}
}
}'
It's important to note that while this script returns the actual terms that were indexed, it also caches all field values and on large indices can use a lot of memory. So, on large indices, it might be more useful to retrieve field values from stored fields or source and reparse them again on the fly using the following MVEL script:
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import java.io.StringReader;
// Cache analyzer for further use
cachedAnalyzer=(isdef cachedAnalyzer)?cachedAnalyzer:doc.mapperService().documentMapper(doc._type.value).mappers().indexAnalyzer();
terms=[];
// Get value from Fields Lookup
//val=_fields[field].values;
// Get value from Source Lookup
val=_source[field];
if(val != null) {
tokenStream=cachedAnalyzer.tokenStream(field, new StringReader(val));
CharTermAttribute termAttribute = tokenStream.addAttribute(CharTermAttribute);
while(tokenStream.incrementToken()) {
terms.add(termAttribute.toString())
};
tokenStream.close();
}
terms
This MVEL script can be stored as config/scripts/analyze.mvel and used with the following query:
curl 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "analyze",
"params": {
"field": "message"
}
}
}
}'
If you mean the tokens that have been indexed you can make a terms facet on the message field. Increase the size value in order to get more entries back, or set to 0 to get all terms.
Lucene provides the ability to store the term vectors, but there's no way to have access to it with elasticsearch by now (as far as I know).
Why do you need that? If you only want to check what you're indexing you can have a look at the analyze api.
Nowadays, it's possible with the Term vectors API:
curl http://localhost:9200/twitter/_termvectors/1?fields=message
Result:
{
"_index": "twitter",
"_id": "1",
"_version": 1,
"found": true,
"took": 0,
"term_vectors": {
"message": {
"field_statistics": {
"sum_doc_freq": 4,
"doc_count": 1,
"sum_ttf": 4
},
"terms": {
"elastic": {
"term_freq": 1,
"tokens": [
{
"position": 2,
"start_offset": 11,
"end_offset": 18
}
]
},
"out": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 7,
"end_offset": 10
}
]
},
"search": {
"term_freq": 1,
"tokens": [
{
"position": 3,
"start_offset": 19,
"end_offset": 25
}
]
},
"trying": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 6
}
]
}
}
}
}
}
Note: Mapping types (here: tweets) have been removed in Elasticsearch 8.x (see migration guide).