Nested documents in Elastic Search - java

we have a program that will use ElasticSearch. We have the need to query using joins, which is not supported in elasticsearch, so we are left with either nested or parent-child relationships. I have read that using parent-child can cause significant performance issues, so we are thinking of going with nested documents.
We index/query on products but we also have customers and vendors. So, this is my thinking for my product mapping:
{
"mappings" : {
"products" : {
"dynamic": false,
"properties" : {
"availability" : {
"type" : "text"
},
"customer": {
"type": "nested"
},
"vendor": {
"type": "nested"
},
"color" : {
"type" : "text"
}
},
"created_date" : {
"type" : "text"
}
}
}
}
}
Here customer and vendor are my mapped fields.
Does this mapping look correct? Since I am setting dynamic to false, do I need to specify the contents of the customer and vendor sub documents? If so, how would I do that?

My team found parent/child relationships to be incredibly detrimental to our performance, so I think you're probably making a good decision to use nested fields.
If you use dynamic: false then undefined fields will not be added to the mapping. You can either set it to true and those fields should be added as you index or you can define the properties on the nested documents yourself:
{
"mappings" : {
"products" : {
"dynamic": false,
"properties" : {
...
"customer": {
"type": "nested",
"properties": {
"prop a": {...},
"prop b": {...}
}
},
"vendor": {
"type": "nested",
"properties": {
"prop a": {...},
"prop b": {...}
}
},
...
}
}
}
}

Related

Jackson: deserialize JSON extract deep attribute into parent class

I have some trouble wording my title, so if my question should be re-worded, I'd be happy to repost this question for clarification. :)
Problem: I have this JSON structure
{
"name": "Bob",
"attributes": {
"evaluation": {
"stats": [
{
"testDate": "2020-02-04",
"score": 50
},
{
"testDate": "2020-04-01",
"score": 90
},
{
"testDate": "2020-05-10",
"score": 85
}
],
"survey": {...}
},
"interests": {...},
"personality": [...],
"someRandomUnknownField": {...}
}
}
attributes is any random number of fields except for evaluation.stats that we want to extract out. I want to be able to deserialize into the following classes:
public class Person {
String name;
Map<String, Object> attributes;
List<Stat> stats;
}
public class Stat {
LocalDate date;
int score;
}
When I serialize it back to JSON, I should expect something like this:
{
"name": "Bob",
"attributes" : {
"evaluation": {
"survey": {...}
},
"interests" : {...},
"personality": {...},
"someRandomUnknownField": {...}
},
"stats": [
{
"testDate": "2020-02-04",
"score": 50
},
{
"testDate": "2020-04-01",
"score": 90
},
{
"testDate": "2020-05-10",
"score": 85
}
]
}
I could technically map the whole Person class to its own custom deserializer, but I want to leverage the built-in Jackson deserializers and annotations as much as possible. It's also imperative that stats is extracted (i.e., stats shouldn't also exist under attributes). I'm having trouble finding a simple and maintainable serialization/deserialization scheme. Any help would be appreciate!
I'm not sure if this meets your criterion for a simple and maintainable serialization/deserialization scheme, but you can manipulate the JSON tree to transform your starting JSON into the structure you need:
Assuming I start with a string containing your initial JSON:
ObjectMapper mapper = new ObjectMapper();
mapper.registerModule(new JavaTimeModule());
JsonNode root = mapper.readTree(inputJson);
// copy the "stats" node to the root of the JSON:
ArrayNode statsNode = (ArrayNode) root.path("attributes").path("evaluation").path("stats");
((ObjectNode) root).set("stats", statsNode);
// delete the original "stats" node:
ObjectNode evalNode = (ObjectNode) root.path("attributes").path("evaluation");
evalNode.remove("stats");
This now gives you the JSON you need to deserialize to your Person class:
Person person = mapper.treeToValue(root, Person.class);
When you serialize the Person object you get the following JSON output:
{
"name" : "Bob",
"attributes" : {
"evaluation" : {
"survey" : { }
},
"interests" : { },
"personality" : [ ],
"someRandomUnknownField" : { }
},
"stats" : [ {
"score" : 50,
"testDate" : "2020-02-04"
}, {
"score" : 90,
"testDate" : "2020-04-01"
}, {
"score" : 85,
"testDate" : "2020-05-10"
} ]
}
Just to note, to get this to work, you need the java.time module:
<dependency>
<groupId>com.fasterxml.jackson.datatype</groupId>
<artifactId>jackson-datatype-jsr310</artifactId>
<version>2.11.3</version>
</dependency>
And you saw how this was registered in the above code:
mapper.registerModule(new JavaTimeModule());
I also annotated the LocalDate field in the Stat class, as follows:
#JsonProperty("testDate")
#JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy-MM-dd")
private LocalDate date;
Very minor note: In your starting JSON (in the question) you showed this:
"personality": [...],
But in your expected final JSON you had this:
"personality": {...},
I assumed this was probably a typo, and it should be an array, not an object, in both cases.

JSON schema for overrides - best practices

I have a use case where I define an object (set of properties) and for some scenarios/criteria, I need to override the set of properties with different values. What is the best way to design the schema.
Below is the sample that I could come up with:
{
"ActionType": "actionType",
"ActionData": {
"Template": {
"version": 1,
"feedback": "Action Feedback.",
"Overrides": [
{
"criteria": {
"allOf": {
"Country": "JP"
}
},
"Template": {
"version": 1,
"feedback": "Japanese Feedback"
}
}
]
}
}
}
I have such overrides for different types of objects in the JSON. My main focus is usability - who ever reads the json will have to replace the properties with appropriate overrides. And how would my Java models look like? as I see there would be self-referencing objects.

How to tell a JSON schema is compatible with another in Java?

For example, I have a JSON schema looks as following:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"billing_address": { "$ref": "#/definitions/address" },
"shipping_address": { "$ref": "#/definitions/address" }
}
"definitions": {
"address": {
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
}
}
This schema indicate an object with two vairable billing_address and shipping_address, both of them are of type address, which contains three properties: street_address, city and state.
Now I got another "larger" schema:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"billing_address": { "$ref": "#/definitions/address" },
"shipping_address": { "$ref": "#/definitions/address" },
"new_address": { "$ref": "#/definitions/address" }
}
"definitions": {
"address": {
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" },
"zip_code": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
}
}
As you can see, I added a new property new_address into the schema, and in address there is a new property called zip_code, which is not a required property.
So if I created an object from the old JSON schema, it should also be available for the new JSON schema. In this case, we will call the new schema is compatible with the old one. (In another word, the new schema is extension of the old one, but no modification.)
The question is how can I judge if a schema is compatible with another in Java? Complicated case should also be taken care, for example "minimum" property for a number field.
Just test it. In my current project, I am writing following contract tests:
1) having Java domain object, I serialize it to JSON and compare it to reference JSON data. I use https://github.com/skyscreamer/JSONassert for comparing two JSON strings.
For reference JSON data, you need to use 'smaller schema' object.
2) having sample JSON data, I deserialize it to my domain object, and verify if deserialization was succesfull. I compare deserialization result with model object. For sample JSON data, you shoud use your 'larger schema' object.
This test verifies if 'larger schema' JSON data is backward compatible with your 'smaller schema' domain.
I write those test at each level of my domain model -one for top-level object, and another one for each non-trivial nested object. That requires more test code and more JSON sample data, but gives much better confidence. If something fails, error messages will be fine-tuned, you will know exactly what level of hierarchy is broken (JSONAssert error messages may have many errors and be non trivial to read for deeply nested object hierarchies). So it's a trade-off between
* time spend to maintain test code and data
* quality of error messages
Such tests are fast- they need just JSON serialization/deserialization.
https://github.com/spring-cloud/spring-cloud-contract will help you writing contract test for REST APIs, messaging, etc- but for simple cases procedure I given above may be good enough

Is it possible to use Spring's annotations to define Completion Suggester for a mapping in Elasticsearch?

I currently have the following POJO.
#Document(indexName="ws",type="vid")
public class Vid {
#Id
private String id;
#Field(type=FieldType.String, index=FieldIndex.not_analyzed)
private List<String> tags;
}
A JSON that represents this POJO is as follows.
{
"id" : "someId",
"tags" : [ "one", "two", "three" ]
}
What I want is to define the mapping for the tags field so that I can use the values in an auto-complete search box. This is supported by Elasticsearch's Completion Suggester. The documentation at https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html seem to suggest to me that I have to set up the mapping as follows.
{
"vid": {
"properties": {
"id": {
"type": "string"
},
"tags": {
"type": "completion",
"index_analyzer": "simple",
"search_analyzer": "simple",
"payloads": true
}
}
}
}
However, that would mean that I would have to revise my POJO and JSON representation.
{
"id": "someId",
"tags": {
"input": [ "one", "two", "three" ]
}
}
I found another good page talking about Completions Suggesters here http://blog.qbox.io/quick-and-dirty-autocomplete-with-elasticsearch-completion-suggest. However, that page seem to suggest redundancy with the tags.
{
"id": "someId",
"tags": [ "one", "two", "three" ],
"tags_suggest": {
"input": [ "one", "two", "three" ]
}
}
Lastly, I found this javadoc page from spring-data-elasticsearch at http://docs.spring.io/spring-data/elasticsearch/docs/current/api/index.html?org/springframework/data/elasticsearch/core/completion/Completion.html. I am sure this class has something to do with Completion Suggesters but I don't know how to use it.
Is there any way I can just use Spring annotations to define the Elasticsearch mapping for Completion Suggester?
Absolutely yes..
you can configure your entity like this:
...
import org.springframework.data.elasticsearch.core.completion.Completion;
...
#Document(indexName = "test-completion-index", type = "annotated-completion-type", indexStoreType = "memory", shards = 1, replicas = 0, refreshInterval = "-1")
public class YoutEntity {
#Id
private String id;
private String name;
#CompletionField(payloads = true, maxInputLength = 100)
private Completion suggest;
...
}
Check this link for example.
I am not experienced with that, but maybe this annotation can be helpful for you:
Link to Spring Data Elasticsearch documentation

Jackson ignore properties using the same class

my question is about the fact that i want to use the same class to deserialize and re-serialize two different Jsons. I try to explain better.
I've these Jsons:
//JSON A
{
"flavors": [
{
"id": "52415800-8b69-11e0-9b19-734f1195ff37",
"name": "256 MB Server",
"ram": 256,
"OS-FLV-DISABLED:disabled":true
"links": [
{
"rel": "self",
"href": "http://www.myexample.com"
},
{
"rel": "bookmark",
"href":"http://www.myexample.com"
}
]
},
...
}
//JSON B
{
"flavors": [
{
"id": "52415800-8b69-11e0-9b19-734f1195ff37",
"name": "256 MB Server",
"links": [
{
"rel": "self",
"href": "http://www.myexample.com"
},
{
"rel": "bookmark",
"href":"http://www.myexample.com"
}
]
},
...
}
As you can see JSON B has all the fields of JSON A except "ram" and
"OS-FLV-DISABLED:disabled". The classes i used are the following:
public class Flavor {
private String name;
private List<Link> links;
private int ram;
private boolean OS_FLV_DISABLED_disabled;
//constructor and getter/setter
}
#XmlRootElement
public class GetFlavorsResponse {
private List<Flavor> flavors;
//constructor and getter/setter
}
Moreover just above the getter method isOS_FLV_DISABLED_disabled i've put the annotation #XmlElement(name = "OS-FLV-DISABLED:disabled")
otherwise Jackson doesn't recognize this property.
Here is the scheme of the situation:
When i receive JSON A there are no problems, JSON resultant is again JSON A; but when i receive JSON B the result of the process deserialization-serialization is:
//JSON C
{
"flavors": [
{
"id": "52415800-8b69-11e0-9b19-734f1195ff37",
"name": "256 MB Server",
"ram": 0,
"OS-FLV-DISABLED:disabled":false
"links": [
{
"rel": "self",
"href": "http://www.myexample.com"
},
{
"rel": "bookmark",
"href":"http://www.myexample.com"
}
]
},
...
}
Now as first thing i thought that Jackson sets class properties that was not in Json with
their default values, that is, 0 and false respectively for "ram" and
"OS-FLV-DISABLED:disabled". So i've put the annotation
#JsonSerialize(include=JsonSerialize.Inclusion.NON_DEFAULT)
just above Flavor class. This works but the problem is that when i receive JSON A in which "ram" and "OS-FLV-DISABLED:disabled" have as values 0 and false (possible situation), the result of the process mentioned above is JSON B since these two fields are ignored.
So established that this is not the solution for my problem, i read that some people suggest to use #JsonView or #JsonFilter but i don't understand how to apply these Jackson features in this case.
I hope i was clear and thanks you in advance for your help.
One thing you can try is that make your ram and OS_FLV_DISABLED_disabled as Integer and Boolean types respectively. By this if no values come in json for these two properties then they will be null. And use this annotation #JsonInclude(Include.NON_NULL) to avoid serializing null properties.

Categories