Query Elastic DSL - Search query using spring boot data - java

I have the following properties file generated via Java and spring boot data elasticsearch. The file is generated in a User.java class and the property "friends" is a List where Friends is a Fiends.java file, both class file act as the model. Essentially I want to produce a select statement but in Query DSL Language using Spring Boot Data. The index is called user.
So I am trying to achieve the following SELECT * FROM User where (userName ="Tom" OR nickname="Tom" OR friendsNickname="Tom") AND userID="3793"
or (verbose-dsl)
match where (userName="Tom" OR nickname="Tom" OR friendsNickname="Tom") AND userID="3793"
"mappings": {
"properties": {
"_class": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"userName": {
"type": "text"
},
"userId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"friends": {
"type": "nested",
"properties": {
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
},
"age": {
"type": "text"
},
"friendsNickname": {
"type": "text"
}
}
},
"nickname": {
"type": "text"
}
}
}
I have tried the following code but return 0 hits back from a elastic search but no dice returns no hits
BoolQueryBuilder query =
QueryBuilders.boolQuery()
.must(
QueryBuilders.boolQuery()
.should(QueryBuilders.matchQuery("userName", "Tom"))
.should(QueryBuilders.matchQuery("nickname", "Tom"))
.should(
QueryBuilders.nestedQuery(
"friends",
QueryBuilders.matchQuery("friendsNickname", "Tom"),
ScoreMode.None)))
.must(QueryBuilders.boolQuery().must(QueryBuilders.matchQuery("userID", "3793")));
Apologies if this seems like a simple question, My knowledge on ES is quite thin, sorry if this may seem like an obvious answer.

Great start!!
You just have a tiny mistake on the following line where you need to prefix the field name by the nested field name, i.e. friends.friendsNickname
...
QueryBuilders.matchQuery("friends.friendsNickname", "Tom"),
... ^
|
prefix
Also you have another typo where the userID should read userId according to your mapping.

Use friends.friendsNickname and also user termsQuery on userId.keyword
`
.must(QueryBuilders.boolQuery()
.should(QueryBuilders.matchQuery("userName", "Tom"))
.should(QueryBuilders.matchQuery("nickname", "Tom"))
.should(QueryBuilders.matchQuery("friends.friendsNickname", "Tom"))
)
.must(QueryBuilders.termsQuery("userId.keyword", "3793"));
`
Although I recommend changing userName, userID to keyword.
"userId": {
"type": "keyword",
"ignore_above": 256,
"fields": {
"text": {
"type": "text"
}
}
}
Then you don't have to put keyword so you just have to put userId instead of userId.keyword. If you want to have full-text search on the field is use userId.text. The disadvantage of having a text type is that you can't use the field to sort your results that's why I encourage ID fields to be of type keyword.

Related

How to insert null values for an Avro map

I have a usecase where I need to have null values allowed for an Avro Map, but it seems like Avro doesn't allow unions for Map values. Basically, I need to implement the functionality of a POJO defined as Map<String,<Optional<String>>>.
How can I achieve this?
The following avro schema throws no type found error:
Error:
org.apache.avro:avro-maven-plugin:1.10.0: schema failed:
No type: {"type":["null","string"]}
{
"namespace": "com.testclass.avro",
"name": "test",
"type": "record",
"fields": [
{
"name": "user",
"type": {
"name": "userdetails",
"type": "record",
"fields": [
{
"name": "isPresent",
"type": "boolean"
},
{
"name": "address",
"type": {
"type": "map",
"name": "address",
"values": {
"type": ["null","string"]
}
}
}
]
}
}
]
}
Specifying the string as a string within the json definition helped solved the problem.
"address":{"test":{"string":"a"}, "test2":{"string":"a"}}

Get available attributes (possibly recursive) from JSON Schema in Java

Let's say I've got the following JSON Schema:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/product.schema.json",
"title": "Draft JSON Schema",
"type": "object",
"properties": {
"person": {
"type": "object",
"properties": {
"details": {
"type": "object",
"properties": {
"first_name": {
"type": "string"
},
"last_name": {
"type": "string"
},
"groups": {
"type": "array",
"items": { "$ref": "#/$defs/existing_groups"
}
}
}
}
},
"$defs": {
"existing_groups": [ "Teachers", "Students" ]
},
"book": {
"type": "object",
"properties": {
"title": {
"type": "string"
},
"author": {
"type": "string"
}
}
}
}
}
From this schema, I would like to retrieve the available attributes and values at a defined depth:
So what's given is e.g. person.details and I want first_name, last_name, groups to be returned.
If person.details.groups is given, the possible values Student, Teacher should be returned.
If book.title is given, an empty Array or Set should be returned.
Apparently you can get attribute values from JSON objects with JsonPath, but I rather want to get possible attributes (and their possible values, if any are given) from a com.networknt.schema.JsonSchema.
What is the easiest way to do this in Java?
JSON Schema is for validating data. It has nothing to do with data manipulation or extraction. It's not comparable to JSONPath in any way.

Validate JSON against JSON schema (in Java)

I am using com.github.fge.jsonschema. Working in Java.
Following is the JSON Schema.
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Employee",
"description": "employee description",
"type": "object",
"properties": {
"eid": {
"description": "The unique identifier for a emp",
"type": "integer"
},
"ename": {
"description": "Name of the emp",
"type": "string"
},
"qual":{
"$ref": "#/definitions/qualification"
}
},
"definitions": {
"qualification":
{
"description": "Qualification",
"type": "string"
}
}
}
and this is the JSON to validate against schema.
{
"eid":1000,
"ename": "mrun",
"qualification": "BE"
}
The issue is that it is validating type (i.e. integer or string) properly for eid and ename, if we pass any wrong data.
For eg:
{
"eid":"Mrun", //should be Integer
"ename": 72831287, //should be String
"qualification": 98372489 //should be String
}
If we pass wrong type for qualification only then it validating as true (i.e. it does not validate the type for qualification maybe because it is nested).
Need to perform the validations for the whole JSON.
Is there any other solution to validate nested objects in JSON?
Thanks in advance.
Your example
{
"eid":"Mrun",
"ename": 72831287,
"qualification": 98372489
}
does not match your schema. Your schema expects objects like
{
"eid": "Mrun",
"ename": 72831287,
"qual": {
"qualification": 98372489
}
}
But if you just want to reuse the "qualification" definition, your schema should look similar to
"properties": {
"eid": {
"description": "The unique identifier for a emp",
"type": "integer"
},
"ename": {
"description": "Name of the emp",
"type": "string"
},
"qualification":{
"$ref": "#/definitions/qualification"
}
}

How to aggregate a 'non - keyword' field in elasticsearch?

I am trying to write an elastic-search query that should list all distinct values held by various fields in a document.When the fields are of type Keyword,the term aggregate query works fine and I can see the values with their counts listed in the buckets.But, I don't get any result when I query for the distinct citrus fruit types, the mapping is as shown below:
{
"vegetables":{
"type": "text",
"fields": {
"keyword" : {
"type" : "keyword",
"ignore_above": 256
}
}
},
"fruits": {
"properties": {
"citrus": {
"properties": {
"orange": {
"type": "long"
},
"lemon": {
"type": "long"
},
"kiwi": {
"type": "long"
}
}
}
}
}
}
and the result I am expecting is :
"aggregations": {
"distinct_citrusy_fruits"{
"buckets" : [
{
"key":"oranges",
"doc_count": 23
},
{
"key":"lemon",
"doc_count": 21
},
{
"key":"kiwi",
"doc_count": 23
}
]
}
}
when I make a term aggregation for the "vegetables" field (which is a keyword type) i am able to get the buckets as above.
How to get the distinct counts in this case?Also, I don't have the option to change the document format.
EDIT- the only workaround I have found till now is to call the mappings api and then parse the nested JSON in my code to get the key values,if there is any better solution possible, please add an answer here.
I think you cannot query or run aggregations on the field names, only on values.
For the fruits i expect the following mapping:
{
"fruits": {
"properties": {
"citrus": {
"properties": {
"kind": {
"type": "keyword"
},
"count": {
"type": "long"
}
}
}
}
}
}
Maybe you can use the _field_names field which contains every fieldname that has a value. (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-field-names-field.html)

Elasticsearch nested sort - mismatch between document and nested object used for sorting

I've been developing a new search API with AWS Elasticsearch (version 6.2) as backend.
Right now, I'm trying to support "sort" options for the API.
My mapping is as follows (unrelated fields not included):
{
"properties": {
"id": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"description": {
"type": "text"
},
"materialDefinitionProperties": {
"type": "nested",
"properties": {
"id": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "case_sensitive_analyzer"
},
"value" : {
"type": "nested",
"properties": {
"valueString": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
I'm attempting to allow the users sort by property value (path: materialDefinitionProperties.value.valueLong.raw).
Note that it's inside 2 levels of nested objects (materialDefinitionProperties and materialDefinitionProperties.value are nested objects).
To sort the results by the value of property with ID "PART NUMBER", my request for sorting is:
{
"fieldName": "materialDefinitionProperties.value.valueString.raw",
"nestedSort": {
"path": "materialDefinitionProperties",
"filter": {
"fieldName": "materialDefinitionProperties.id",
"value": "PART NUMBER",
"slop": 0,
"boost": 1
},
"nestedSort": {
"path": "materialDefinitionProperties.value"
}
},
"order": "ASC"
}
However, as I examined the response, the "sort" field does not match with document's property value:
{
"_index": "material-definition-index-v2",
"_type": "default",
"_id": "development_LITL4ZCNE",
"_source": {
"id": "LITL4ZCNE",
"description": [
"CPU, Intel, Cascade Lake, 8259CL, 24C, 210W, B1 Prod"
]
"materialDefinitionProperties": [
{
"id": "PART NUMBER",
"description": [],
"value": [
{
"valueString": "202-001193-001",
"isOriginal": true
}
]
}
]
},
"sort": [
"100-000018"
]
},
The document's PART NUMBER property is "202-001193-001", the "sort" field says "100-000018", which is the part number of another document.
It seems that there's a mismatch between the master document and nested object used for sorting.
This request worked well when there's only a small number of documents in the cluster. But once I backfill the cluster with ~1 million of records, the symptom appears. I've also tried creating a new ES cluster but the results are the same.
Sorting by other non-nested attributes worked well.
Did I misunderstand the concept of nested objects, or misuse the nested sort feature?
Any ideas appreciated!
This is a bug in Elasticsearch. Upgrading to 6.4.0 fixed the issue.
Issue tracker: https://github.com/elastic/elasticsearch/pull/32204
Release note: https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-6.4.0.html

Categories