Indexing heavily nested json data in solr using DIH - java

I using DIH to import the data from a NoSQL data source . The format looks something like below
{
"id" : "123-145-app"
"name" : "apple",
"type" : "electronic",
"information": {
"category":["tablets","laptops","mobile"],
"stores": [
{
"name": "imagine",
"location" : "DLH"
},
{
"name": "abc",
"location" : "BLR"
}
],
"head_office" : "US"
}
}
when I try to index this using :
https://lucene.apache.org/solr/guide/6_6/transforming-and-indexing-custom-json.html
I am getting error stating : "Unknown operation for the an atomic update, operation ignored" , and the record with that data is getting skipped.
And in the documentation they have not mentioned about how to incorporate the split via configuration file or schema.xml
Can some one please help me with this ?

Related

MongoDB - Update parts of object

I have the collection that stores documents per some execution Flow.
Every Process includes "processes" and each process includes steps.
So I end up with a 'flows' collection that has documents that look like this:
{
"name" : "flow1",
"description" : "flow 1 description",
"processes" : [
{
"processId" : "firstProcessId",
"name" : "firstProcessName",
"startedAt" : null,
"finishedAt" : null,
"status" : "PENDING",
"steps" : [
{
"stepId" : "foo", ​
​"status" : "PENDING",
​"startedAt" : null,
​"finishedAt" : null
},
{
"stepId" : "bar",​
​"status" : "PENDING",
​"startedAt" : null,
​"finishedAt" : null
}
...
​]
},
{
"processId" : "secondProcessId",
"name" : "secondProcessName",
"startedAt" : null,
"finishedAt" : null,
"status" : "PENDING",
"steps" : [
{
"stepId" : "foo", ​
​"status" : "PENDING",
​"startedAt" : null,
​"finishedAt" : null
},
{
"stepId" : "xyz",​
​"status" : "PENDING",
​"startedAt" : null,
​"finishedAt" : null
}
...
​]
}
}
A couple of notes here:
Each flow contains many processes
Each process contains at least one step, it is possible that in different processes the steps with the same id might appear (id is something that the programmer specifies),
It can be something like "step of bringing me something from the DB", so this is a kind of reusable component in my system.
Now, when the application runs I would like to call DAO's method like
"startProcess", "startStep".
So I would like to know what is the correct query for starting step given processId and steps.
I can successfully update the process description to "running" given the flow Id and the process Id:
db.getCollection('flows').updateOne({"name" : "flow1", "processes" : {$elemMatch : {"processId" : "firstProcessId"}}}, {$set: {"processes.$.status" : "RUNNING"}})
However I don't know how to update the step status given the flowId, process Id and step Id, it looks like it doesn't allow multiple "$" signs in the path:
So, this doesn't work:
db.getCollection('flows').updateOne({"name" : "flow1", "processes" : {$elemMatch : {"processId" : "firstProcessId"}}, "processes.steps.stepId" : {$elemMatch : {"stepId" : "foo"}}}, {$set: {"processes.$.steps.$.status" : "RUNNING"}})
What is the best way to implement such an update?
To update the document in multi-level nested array, you need $[<identifier>] filtered positional operator and arrayFilters.
And the processes and processes.steps.stepId filter in the match operator can be removed as the filter is performed in arrayFilters.
db.collection.update({
"name": "flow1"
},
{
$set: {
"processes.$[process].steps.$[step].status": "RUNNING"
}
},
{
arrayFilters: [
{
"process.processId": "firstProcessId"
},
{
"step.stepId": "foo"
}
]
})
Sample Mongo Playground
Reference
Update Nested Arrays in Conjunction with $[]
As you mentioned it does not work with multiple arrays, straight from the docs:
The positional $ operator cannot be used for queries which traverse more than one array, such as queries that traverse arrays nested within other arrays, because the replacement for the $ placeholder is a single value
I recommend you use arrayFilters instead, it's behavior is much clearer especially when working with nested structures:
db.collection.updateMany(
{
"name": "flow1",
"processes.processId": "firstProcessId",
"processes.steps.stepId": "foo"
},
{
$set: {
"processes.$[process].steps.$[step].status": "RUNNING"
}
},
{
arrayFilters: [
{
"process.processId": "firstProcessId"
},
{
"step.stepId": "foo"
}
]
})
Mongo Playground

id is being replaced by _id, of the inner object, when updating data of MongoDB using Java Driver

I'm trying to update an object of MongoDB. I'm using Java Driver (Sync).
After a 'create' operation, the data is persisted as follows:
{
"_id" : ObjectId("5f2b7deb62798d1045a47313"),
"name" : "John",
"other_info" : {
"images" : {
"images" : [
{
"id" : "1",
"imgType" : "IDBACKIMAGE"
},
{
"id" : "2",
"imgType" : "SIGCARDIMAGE"
}
]
},
},
"status" : "PENDING"
}
Now, I want to modify the 'id' parameter of the images array. So I update the data using getCollection().updateOne(filterCondition, combine(updateData)), but it is persisted as below:
{
"_id" : ObjectId("5f2b7deb62798d1045a47313"),
"name" : "John",
"other_info" : {
"images" : {
"images" : [
{
"_id" : "3",
"imgType" : "IDBACKIMAGE"
},
{
"_id" : "4",
"imgType" : "SIGCARDIMAGE"
}
]
},
},
"status" : "PENDING"
}
As you can see in the updated data, the 'id' property of the images array is now '_id' after update operation. I had provided the JSON with 'id' field but somehow the Mongo Client considered 'id' as '_id' and persisted '_id'. This happens when replaceOne() is used too. This doesn't happen with the create operation as you can see above. Is this an expected behavior when update operation is done? Why is MongoDB treating 'id' and '_id' as same?

Spring Mongo criteria on list of list

Hi I am using spring data mongo, I need to fetch data based on multiple where condition. The problem I have when I want to apply a where clause to a list in a list.
For example
{
"_id" : ObjectId("5982bf9339f3c92b84be4737"),
"_class" : "com.paladion.payment.model.GroupQuestionMapping",
"saqID" : "SAQ A",
"saqVersion" : "3",
"questionTab" : {
"Secure Network" : [
{
"number" : "2.1 (a)",
"question" : "Are vendor-supplied"
"description" : "<ul><li>Review"
},
{
"number" : "2.1 (b)",
"question" : "Are unnecessary"
"description" : "<ul><li>Review policies
}
],
"Access Control" : [
{
"number" : "2.1 (a)",
"question" : "Are vendor-supplied"
"description" : "<ul><li>Review"
},
{
"number" : "2.1 (b)",
"question" : "Are unnecessary"
"description" : "<ul><li>Review policies
}
]
}
}
Over here I need to fetch data where saqId is SAQ A and saq Version is 3 and questionTab is secure network.
I have problem in applying criteria on questionTab.
my code:
Query query = new Query();
query.addCriteria(Criteria.where("saqtype").is(saqType));
query.addCriteria(Criteria.where("saqversion").is(saqVersion));
query.addCriteria(/* criteria on questionTab */);
query.addCriteria(Criteria.where("questionTab.Secure Network").exists(true));
Thing to note is that it will bring you the full document based on criteria, so you would have to filter out other type of questionTab from document.
Other way is aggregation but then I think processing on application layer might be preferable.

What is wrong with this Cypher query?

I am trying to send the following to Neo4j using the REST interface, specifically the method given in the 2.2.9 manual tutorial on using REST from Java, sendTransactional Cypher query. However, as you can see below, I keep on getting an error that doesn't seem to have much to do with the query itself.
Any tips on how I can debug this?
CREATE (p:player { props }), "parameters" {
"props" : {
"screen_name" : "testone",
"email" : "test.one#gmail.com",
"rank" : "-12",
"password" : "testonepass",
"details" : "test one details",
"latitude" : "0.0",
"longitude" : "0.0",
"available" : "true",
"publish" : "true" }}
{"results":[],"errors":[{"code":"Neo.ClientError.Request.InvalidFormat","message":"Unable to deserialize request: Unexpected character ('p' (code 112)): was expecting comma to separate OBJECT entries\n at [Source: HttpInputOverHTTP#10401de; line: 1, column: 66]"}]}
Form the snippet you posted, it looks like the payload to the transactional endpoint is incomplete. You could try this statement in the browser. I just copied your statement and formatted it so it could be posted to the browser. Then you can at least see it work. Clearly the data is being posted and it seems it is just comes down to formatting.
:POST /db/data/transaction/commit {
"statements": [
{
"statement": "CREATE (p:player { props })",
"parameters":
{
"props" : {
"screen_name" : "testone",
"email" : "test.one#gmail.com",
"rank" : "-12",
"password" : "testonepass",
"details" : "test one details",
"latitude" : "0.0",
"longitude" : "0.0",
"available" : "true",
"publish" : "true" }
}
}
]
}

MongoDB JSON array within JSON object field removal

I have a json object as following:
{ "_id" : ObjectId("508806803bb97dc546e6f307"), "user_name" : "user1", "user_id" : 45645645, "likes" : [ { "event_id" : NumberLong("4578541212") },{ "event_id" : NumberLong("4578541213") } ], "dislikes" : [ ] }
I'm trying to delete specific event within likes array via java drivers
tried doing this first in shell:
> db.users.update( {'likes.event_id' : 4578541212}, { '$unset':{'likes.event_id'
:1}})
with no luck...how can I manage doing that?
If you want to just remove the event_id field from the array element:
db.users.update( {'likes.event_id' : 4578541212}, {'$unset':{'likes.$.event_id' :1}})
Use the $pull operator to delete the element:
db.users.update({'likes.event_id': 4578541212}, {'$pull':{likes: {event_id: 4578541212}}})

Categories