How to design data for complex recursive entities for persistence? - java

I was assuming that I have an entity as "Person".
{
"UnqIdr": 125,
"FrstNm": "Mark",
"LastNm": "Antony",
"Gndr": "Male",
"DtOfBirth": "06-09-2020",
"CtctDtls": {
"Addr": [
{
"UnqIdr": "10001",
"Ln1": "Street name",
"Ln2": "Block Number",
"Ln3": "Ward number",
"Cty": "New York",
"ZipCd": "60034",
"Stat": "New Jersey",
"Ctry": "North America",
"IsPrmy": true
}
],
"PhneNb": [
{
"Nm": "Principal",
"CtryCd": "+1",
"Nb": "1234567890",
"IsPrmy": true
}
],
"Email": "abc#def.com",
"CtctURL": "www.def.com",
"SclMdia": {
"FacebookURL": "www.facebook.com/def",
"LinkedInURL": "www.linkedin.com/us/def",
"TwitterURL": "www.twitter.com/3634556"
}
},
"IdntyProof": [{
"UnqIdr": 16537,
"Ctry": "India",
"IdntyTp": 6548,
"IdntyIdr": "INYHGB3462",
"IsVerified": true,
"VldFrm": "16-01-2000",
"VldTill": "4-12-2023"
}],
"PrsnlIdnty": {
"BldGrp": "A",
"Id":[{
"Nt": "Mole in right arm"
}]
},
"Ethncty": "Nadar",
"Rlgn": "Hindu",
"Ntnlty": "Indian",
"PrvsNtnlty": [{
"Ntnlty": "Indian",
"IdntyProof": [{
"UnqIdr": 16537,
"Ctry": "India",
"IdntyTp": 6548,
"IdntyIdr": "INYHGB3462",
"IsVerified": true,
"VldFrm": "16-01-2000",
"VldTill": "4-12-2023"
}]
}],
"MrtlSts": "Married",
"Rltsh": [{
"RltshTp": "Spouse",
"UnqIdr": 134
},{
"RltshTp": "Divorcee",
"UnqIdr": 130
}]
}
However, the same information applies to an Employee, Customer and few more.
Structure of employee might be
{
"UnqIdr": 125,
"Department": "Chem Lab",
"Person": {...}
}
However, when building the logic, we found an employee can also be a customer. Hence we thought of bundling as follows:
{
//person-info
"employee-info": {},
"customer-info": {}
}
Now the problem comes up[ how to query with employee-info or customer-info.
I know it is data design; however, we are using Java 11 and Spring JPA for the same.
Additionally, which would be effective ways to design the solution. Even using NoSQL database is open for discussion.

Look for data normalization with relational databases.
A simple solution is to store Person object in a different table and assigning it a personId field.
So the employee structure becomes:
{
"UnqIdr": 125,
"Department": "Chem Lab",
"PersonId": 420
}

Relational databases are made for such data domains (eventual consistency for people....... no please).
Look here for some database design involving CUSTOMER and EMPLOYEE:
https://www.oracletutorial.com/getting-started/oracle-sample-database/
Now, you can still have Java inheritance regarding these People common attributes.

Related

Flattening a heavily nested JSON in Java - Time Complexity

{
"id": "12345678",
"data": {
"address": {
"street": "Address 1",
"locality": "test loc",
"region": "USA"
},
"country_of_residence": "USA",
"date_of_birth": {
"month": 2,
"year": 1988
},
"links": {
"self": "https://testurl"
},
"name": "John Doe",
"nationality": "XY",
"other": [
{
"key1": "value1",
"key2": "value2
},
{
"key1": "value1",
"key2": "value2"
}
],
"notified_on": "2016-04-06"
}
}
I am trying to read data from a GraphQL API that returns paginated JSON response. I need to write this into a CSV. I have been exploring Spring Batch for implementation where I would read JSON data in the ItemReader and flatten each JSON entry (in ItemProcessor) and then write this flattened data into a CSV (in ItemWriter). While I could use something like Jackson for flattening the JSON, I am concerned about possible performance implications if the JSON data is heavily nested.
expected output:
id, data.address.street, data.address.locality, data.address.region, data.country_of_residence, data.date_of_birth.month, data.date_of_birth.year, data.links.self, data.name, data.nationality, data.other (using jsonPath), data.notified_on
I need to do process more than a million records. While I believe flattening the CSV would be a linear operation O(n), I was still wondering if there could be other caveats if the JSON structure gets severely nested.

How to query MongoDb subdocuments in Java

Dear MongoDb Experts out there! I am new to Mongo and I am currently working with a MongoDb with Java.
Having a MongoDb with a collection called "teams" with the following example structure:
{[
{
"id": "123",
"name": "Dev1",
"employees": [
{"name": "John", "age": 30},
{"name": "Jane", "age": 30}
]
},
{
"id": "456",
"name": "Dev2",
"employees": [
{"name": "Mike", "age": 30},
{"name": "Oscar", "age": 27}
]
}
]}
I want to have a query which returns an array with all employees, that are 30 years old. So the expected result would be:
{[
{"name": "John", "age": 30},
{"name": "Jane", "age": 30},
{"name": "Mike", "age": 30}
]}
It would be even better to only get the employees name (since I know the age I searched for), like:
{[
{"name": "John"},
{"name": "Jane"},
{"name": "Mike"}
]}
I have a MongoCollection object:
MongoCollection<Document> collection = mongoClient
.getDatabase(databaseName)
.getCollection("teams")
My question: Is it possible to retrieve my expected result from the MongoDb? If so, which operations do I have to call on my collection object?
Approach 1: Find with $elemMatch projection
db.getCollection('teams').find(
{"employees.age":30},
{ "employees": { "$elemMatch": { "age": 30 } },"_id":0 }
)
Output Format:
{
"employees" : [
{
"name" : "John",
"age" : 30.0
}
]
}
{
"employees" : [
{
"name" : "Mike",
"age" : 30.0
}
]
}
Approach 2: Aggregate with $unwind and key renaming using $projection
db.getCollection('teams').aggregate([
{"$match": {"employees.age":30}} ,
{"$unwind": "$employees"},
{"$match": {"employees.age":30}} ,
{"$project": {"name": "$employees.name","_id":0}}
])
Output format:
{
"name" : "John"
}
{
"name" : "Jane"
}
{
"name" : "Mike"
}
Approach 1 would be faster, but require additional work at app layer. It is recommended to offload formatting tasks to app servers rather than on db itself.
Approach 2 can instantly give to the formatted results, doing the querying and formatting both in the database servers

How do I create an ElasticSearch query without knowing what the field is?

I have someone putting JSON objects into Elasticsearch for which I do not know any fields. I would like to search all the fields for a given value using a matchQuery.
I understand that the _all is deprecated, and the copy_to doesn't work because I don't know what fields are available beforehand. Is there a way to accomplish this without knowing what fields to search for beforehand?
Yes, you can achieve this using a custom _all field (which I called my_all) and a dynamic template for your index. Basically, this idea is to have a generic mapping for all fields with a copy_to setting to the my_all field. I've also added store: true for the my_all field but only for the purpose of showing you that it works, in practice you won't need it.
So let's go and create the index:
PUT my_index
{
"mappings": {
"_doc": {
"dynamic_templates": [
{
"all_fields": {
"match": "*",
"mapping": {
"copy_to": "my_all"
}
}
}
],
"properties": {
"my_all": {
"type": "text",
"store": true
}
}
}
}
}
Then index a document:
PUT my_index/_doc/1
{
"test": "the cat drinks milk",
"age": 10,
"alive": true,
"date": "2018-03-21T10:00:00.123Z",
"val": ["data", "data2", "data3"]
}
Finally, we can search using the my_all field and also show its content (because we store its content) in addition to the _source of the document:
GET my_index/_search?q=my_all:cat&_source=true&stored_fields=my_all
And the result is shown below:
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"test": "the cat drinks milk",
"age": 10,
"alive": true,
"date": "2018-03-21T10:00:00.123Z",
"val": [
"data",
"data2",
"data3"
]
},
"fields": {
"my_all": [
"the cat drinks milk",
"10",
"true",
"2018-03-21T10:00:00.123Z",
"data",
"data2",
"data3"
]
}
}
So given you can create the index and mapping of your index, you'll be able to search whatever people are sending to it.

GPath on nested objects

I'm not sure what I'm doing wrong here and I'm having a hell of a time with getting this to work properly. Using this JSON:
{
"books": [
{
"category": "reference",
"author": {
"name": "Nigel Rees",
"age": 45
},
"title": "Sayings of the Century",
"price": 8.95,
"tax": 7.00
},
{
"category": "reference",
"author": {
"name": "Evelyn Waugh",
"age": 30
},
"title": "A cheap book",
"price": 6.00,
"tax": 3.00
}
]
}
I'm not able to extract the books where the authors age is 45, for example. I've tried things like the following (with the books document root set)
findAll {it.author.age == 45}
findAll {it.author}.findAll {it.age == 45}
findAll {it.author}*.each {it.age == 45 }
I still get back all of the records that have an age item. It can be any arbitrary object, it might not have an author record, etc. And I want the root book object returned.
I feel like there's something really obvious I'm missing, but the docs seem to only cover one level of key-values. Maybe it doesn't support it?
Thanks!
here is it
books.findAll{it.author.age==45}
It's not working your way (findAll {it.author.age == 45}) because you work from the root, and 'it' variable returns 'books' object, which has no field 'author'.

how to perform solr joins

Hi i am trying to join two documents from two collections in solr
Solr document-1 Collection-A
{
"student_id": "123",
"name": "rahul",
"adrress": "addr001"
}
solr document-2 Collection-B
{
"address_id": "addr001",
"street_name": "XYZ street",
"colony": "ABC colony"
}
I am expecting my final document should be like below after join
{
"student_id": "123",
"name": "rahul",
"street_name": "XYZ street",
"colony": "ABC colony"
}
Can anyone help me how to perform joins in solr with relevant example as per my requirements ?

Categories