How to aggregate a 'non - keyword' field in elasticsearch? - java

I am trying to write an elastic-search query that should list all distinct values held by various fields in a document.When the fields are of type Keyword,the term aggregate query works fine and I can see the values with their counts listed in the buckets.But, I don't get any result when I query for the distinct citrus fruit types, the mapping is as shown below:
{
"vegetables":{
"type": "text",
"fields": {
"keyword" : {
"type" : "keyword",
"ignore_above": 256
}
}
},
"fruits": {
"properties": {
"citrus": {
"properties": {
"orange": {
"type": "long"
},
"lemon": {
"type": "long"
},
"kiwi": {
"type": "long"
}
}
}
}
}
}
and the result I am expecting is :
"aggregations": {
"distinct_citrusy_fruits"{
"buckets" : [
{
"key":"oranges",
"doc_count": 23
},
{
"key":"lemon",
"doc_count": 21
},
{
"key":"kiwi",
"doc_count": 23
}
]
}
}
when I make a term aggregation for the "vegetables" field (which is a keyword type) i am able to get the buckets as above.
How to get the distinct counts in this case?Also, I don't have the option to change the document format.
EDIT- the only workaround I have found till now is to call the mappings api and then parse the nested JSON in my code to get the key values,if there is any better solution possible, please add an answer here.

I think you cannot query or run aggregations on the field names, only on values.
For the fruits i expect the following mapping:
{
"fruits": {
"properties": {
"citrus": {
"properties": {
"kind": {
"type": "keyword"
},
"count": {
"type": "long"
}
}
}
}
}
}
Maybe you can use the _field_names field which contains every fieldname that has a value. (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-field-names-field.html)

Related

Query Elastic DSL - Search query using spring boot data

I have the following properties file generated via Java and spring boot data elasticsearch. The file is generated in a User.java class and the property "friends" is a List where Friends is a Fiends.java file, both class file act as the model. Essentially I want to produce a select statement but in Query DSL Language using Spring Boot Data. The index is called user.
So I am trying to achieve the following SELECT * FROM User where (userName ="Tom" OR nickname="Tom" OR friendsNickname="Tom") AND userID="3793"
or (verbose-dsl)
match where (userName="Tom" OR nickname="Tom" OR friendsNickname="Tom") AND userID="3793"
"mappings": {
"properties": {
"_class": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"userName": {
"type": "text"
},
"userId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"friends": {
"type": "nested",
"properties": {
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
},
"age": {
"type": "text"
},
"friendsNickname": {
"type": "text"
}
}
},
"nickname": {
"type": "text"
}
}
}
I have tried the following code but return 0 hits back from a elastic search but no dice returns no hits
BoolQueryBuilder query =
QueryBuilders.boolQuery()
.must(
QueryBuilders.boolQuery()
.should(QueryBuilders.matchQuery("userName", "Tom"))
.should(QueryBuilders.matchQuery("nickname", "Tom"))
.should(
QueryBuilders.nestedQuery(
"friends",
QueryBuilders.matchQuery("friendsNickname", "Tom"),
ScoreMode.None)))
.must(QueryBuilders.boolQuery().must(QueryBuilders.matchQuery("userID", "3793")));
Apologies if this seems like a simple question, My knowledge on ES is quite thin, sorry if this may seem like an obvious answer.
Great start!!
You just have a tiny mistake on the following line where you need to prefix the field name by the nested field name, i.e. friends.friendsNickname
...
QueryBuilders.matchQuery("friends.friendsNickname", "Tom"),
... ^
|
prefix
Also you have another typo where the userID should read userId according to your mapping.
Use friends.friendsNickname and also user termsQuery on userId.keyword
`
.must(QueryBuilders.boolQuery()
.should(QueryBuilders.matchQuery("userName", "Tom"))
.should(QueryBuilders.matchQuery("nickname", "Tom"))
.should(QueryBuilders.matchQuery("friends.friendsNickname", "Tom"))
)
.must(QueryBuilders.termsQuery("userId.keyword", "3793"));
`
Although I recommend changing userName, userID to keyword.
"userId": {
"type": "keyword",
"ignore_above": 256,
"fields": {
"text": {
"type": "text"
}
}
}
Then you don't have to put keyword so you just have to put userId instead of userId.keyword. If you want to have full-text search on the field is use userId.text. The disadvantage of having a text type is that you can't use the field to sort your results that's why I encourage ID fields to be of type keyword.

Elasticsearch nested sort - mismatch between document and nested object used for sorting

I've been developing a new search API with AWS Elasticsearch (version 6.2) as backend.
Right now, I'm trying to support "sort" options for the API.
My mapping is as follows (unrelated fields not included):
{
"properties": {
"id": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"description": {
"type": "text"
},
"materialDefinitionProperties": {
"type": "nested",
"properties": {
"id": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "case_sensitive_analyzer"
},
"value" : {
"type": "nested",
"properties": {
"valueString": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
I'm attempting to allow the users sort by property value (path: materialDefinitionProperties.value.valueLong.raw).
Note that it's inside 2 levels of nested objects (materialDefinitionProperties and materialDefinitionProperties.value are nested objects).
To sort the results by the value of property with ID "PART NUMBER", my request for sorting is:
{
"fieldName": "materialDefinitionProperties.value.valueString.raw",
"nestedSort": {
"path": "materialDefinitionProperties",
"filter": {
"fieldName": "materialDefinitionProperties.id",
"value": "PART NUMBER",
"slop": 0,
"boost": 1
},
"nestedSort": {
"path": "materialDefinitionProperties.value"
}
},
"order": "ASC"
}
However, as I examined the response, the "sort" field does not match with document's property value:
{
"_index": "material-definition-index-v2",
"_type": "default",
"_id": "development_LITL4ZCNE",
"_source": {
"id": "LITL4ZCNE",
"description": [
"CPU, Intel, Cascade Lake, 8259CL, 24C, 210W, B1 Prod"
]
"materialDefinitionProperties": [
{
"id": "PART NUMBER",
"description": [],
"value": [
{
"valueString": "202-001193-001",
"isOriginal": true
}
]
}
]
},
"sort": [
"100-000018"
]
},
The document's PART NUMBER property is "202-001193-001", the "sort" field says "100-000018", which is the part number of another document.
It seems that there's a mismatch between the master document and nested object used for sorting.
This request worked well when there's only a small number of documents in the cluster. But once I backfill the cluster with ~1 million of records, the symptom appears. I've also tried creating a new ES cluster but the results are the same.
Sorting by other non-nested attributes worked well.
Did I misunderstand the concept of nested objects, or misuse the nested sort feature?
Any ideas appreciated!
This is a bug in Elasticsearch. Upgrading to 6.4.0 fixed the issue.
Issue tracker: https://github.com/elastic/elasticsearch/pull/32204
Release note: https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-6.4.0.html

Elastic Search wildcard query not working with case insensitive ( for lower case)

I am trying to fetch records from elasticsearch using wildcard queries.
Please find the below query
get my_index12/_search
{
"query": {
"wildcard": {
"code.keyword": {
"value": "*ARG*"
}
}
}
}
It's working and giving expected results for the above query., but it is not working for the lower case value.
get my_index12/_search
{
"query": {
"wildcard": {
"code.keyword": {
"value": "*Arg*"
}
}
}
}
Try Following:
Mapping:
PUT my_index12
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
Then Run Query String Query
GET my_index12/_search
{
"query": {
"query_string": {
"default_field": "code",
"query": "AB\\-7000*"
}
}
}
It will also work for ab-7000*
Let me know if it works for you.
You have to normalize your keyword field:
ElasticSearch normalizer
Something like (from documentation):
PUT index
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"foo": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}
UPDATE
Some additional info:
Only parts of the analysis chain that operate at the character level are applied. So for instance, if the analyzer performs both lowercasing and stemming, only the lowercasing will be applied: it would be wrong to perform stemming on a word that is missing some of its letters.
By setting analyze_wildcard to true, queries that end with a * will be analyzed and a boolean query will be built out of the different tokens, by ensuring exact matches on the first N-1 tokens, and prefix match on the last token.

How to make a JSON Parser to parse elasticsearch mapping in java?

I wanted to parse this structure which is an elasticsearch filter:
{
"filter": {
"name_synonyms_filter": {
"synonym_path": "sample.txt",
"type": "abc_synonym_filter"
},
"name_formatter": {
"name": "name_formatter",
"type": "abc_token_filter"
}
}
}
My question is how can I access individual filters without using key ("name_synonyms_filter" , etc) in java?
your JSON was impropertly formatted.
Here it is fixed:
{
"abc": [{
"name": "somename"
},
{
"name": "somename"
}
]
}
How to parse it:
let x = JSON.parse({
"abc": [{
"name": "somename"
},
{
"name": "somename"
}
]
});
console.log(x);
Let me know if you have any questions.

Can declare your JSON Schema by a reference to type?

I am trying to validate a small bit of JSON like:
{
"success": true,
"message": "all's good!"
}
which works with the schema:
{
"type": "object",
"properties": {
"success": { "type": "boolean" },
"message": { "type": "string" }
}
}
however it fails with the schema
{
"definitions": {
"response": {
"type": "object",
"properties": {
"success": { "type": "boolean" },
"message": { "type": "string" }
}
}
},
"type": { "$ref": "#/definitions/response" }
}
with the error
java.lang.AssertionError: schema resource:/json-schema/sample.schema.json was > invalid: fatal: invalid JSON Schema, cannot continue
Syntax errors:
[ {
"level" : "error",
"message" : "value has incorrect type (found object, expected one of [array, string])",
"domain" : "syntax",
"schema" : {
"loadingURI" : "resource:/json-schema/sample.schema.json#",
"pointer" : ""
},
"keyword" : "type",
"found" : "object",
"expected" : [ "array", "string" ]
} ]
level: "fatal"
are you not allowed to use a reference for a type outside the definitions section? My motivation is that this is a response to a singular case, but there are cases where this structure is nested in others as well.
If it matters I'm using json-schema-validator version 2.2.6.
PS - this is a simplified example, the actual schema is more complicated as to justify why reuse and not copying and pasting is desirable.
You can use "id" and "$ref".
id for identifying, e.g.:
{
"type": "object",
"id": "#response",
"properties": {
"success": { "type": "boolean" },
"message": { "type": "string" }
}
}
}
And then you use $ref, e.g.:
"some": { "$ref": "#response" }
or external ref:
"ext": { "$ref": "http://url.com#response" }
See
http://json-schema.org/latest/json-schema-core.html#anchor27
The value of the type keyword must be a string of the name of one of the JSON primitave types (e.g. "string", "array", etc.), or an array of these strings. That is what the error message is saying. Keyword type must be a string or an array. The closest thing to what I think you are trying to do is this ...
{
"definitions": {
"response": {
"type": "object",
"properties": {
"success": { "type": "boolean" },
"message": { "type": "string" }
}
}
},
"allOf": [{ "$ref": "#/definitions/response" }]
}
You should declare your definition in it's own file, and they have your types refer to that file reference. See How to manage multiple JSON schema files? for details.

Categories