Compare two json structure and get mismatch changes - java

Hi I have json structure #1 and #2 as follows. I would like to compare and capture the results.
Json #1.
{
"menu": {
"id": "file",
"popup": {
"menuitem": {
"menuitem-1": "sometext",
"menuitem-2": {
"menuitem-2.1": "sometext",
"menuitem-2.2": "sometext",
"menuitem-2.3": {
"menuitem-2.3.1": "sometext"
}
}
}
},
"value": "File"
}
}
Json #2
{
"menu": {
"id": "file",
"popup": {
"menuitem": {
"menuitem-2.3": {
"menuitem-2.3.1": "sometext"
}
"menuitem-1": "sometext",
"menuitem-2": {
"menuitem-2.1": "sometext",
"menuitem-2.2": "sometext"
},
}
},
"value": "File"
}
}
Am expecting that below JSON has been moved up in JSON #2. My goal here is identify any CREATE NEW / UPDATE / ADJUSTED / DELETE on JSON#2.
"menuitem-2.3": {
"menuitem-2.3.1": "sometext"
}
Is there any Spring / Java existing framework available to achieve above?

Use difference from org.apache.commons.lang.StringUtils.
Compares two Strings, and returns the portion where they differ. (More precisely, return the remainder of the second String, starting from where it's different from the first.)
For example,
difference("i am a machine", "i am a robot") -> "robot".
StringUtils.difference(null, null) = null
StringUtils.difference("", "") = ""
StringUtils.difference("", "abc") = "abc"
StringUtils.difference("abc", "") = ""
StringUtils.difference("abc", "abc") = ""
StringUtils.difference("ab", "abxyz") = "xyz"
StringUtils.difference("abcde", "abxyz") = "xyz"
StringUtils.difference("abcde", "xyz") = "xyz"
Parameters:
str1 - the first String, may be null
str2 - the second String, may be null

Try using Apache drill. It is easy to install and supports querying JSON. You can then execute a minus query and get the difference.
You can also query drill using java. Apache drill has a JDBC driver for that.
Hope it helps. :)

Related

Google DLP - Can I use a delimiter to instruct DLP infotype detectors to search only inside that for sensitive text?

I have an issue while trying to deidentify some data with DLP using an object mapper to parse the object into string - send it to DLP for deidentification - getting back the deidentified string and using the object mapper to parse the string back to the initial object. Sometimes DLP will return a string that cannot be parsed back to the initial object (it breaks the json format of the object mapper)
I use an objectMapper to parse an Address object to string like this:
Address(
val postal_code: String,
val street: String,
val city: String,
val provence: String
)
and my objectmapper will transform this object into a string eg: "{\"postal_code\":\"123ABC\",\"street\":\"Street Name\",\"city\":\"My City\",\"provence\":\"My Provence\"}" which is sent to DLP and deidentified (using LOCATION or STREET_ADDRESS detectors).
The issue is that my object mapper would expect to take back the deidentified string and parse it back to my Address object using the same json format eg:
"{\"postal_code\":\"LOCATION_TOKEN(10):asdf\",\"street\":\"LOCATION_TOKEN(10):asdf\",\"city\":\"LOCATION_TOKEN(10):asdf\",\"provence\":\"LOCATION_TOKEN(10):asdf\"}"
But there are a lot of times that DLP will return something like
"{"LOCATION_TOKEN(25):asdfasdfasdf)\",\"provence\":\"LOCATION_TOKEN(10):asdf\"}" - basically breaking the json format and i am unable to parse back the string from DLP to my initial object
Is there a way to instruct DLP infotype detectors to keep the json format, or to look for sensitive text only inside \" * \"?
Thanks
There are some options here using a custom regex and a detection ruleset in order to define a boundary on matches.
The general idea is that you require that findings must match both an infoType (e.g. STREET_ADDRESS, LOCATION, PERSON_NAME, etc.) and your custom infoType before reporting as a finding or for redaction. By requiring that both match, you can set bounds on where the infoType can detect.
Here is an example.
{
"item": {
"value": "{\"postal_code\":\"123ABC\",\"street\":\"Street Name\",\"city\":\"My City\",\"provence\":\"My Provence\"}"
},
"inspectConfig": {
"customInfoTypes": [
{
"infoType": {
"name": "CUSTOM_BLOCK"
},
"regex": {
"pattern": "(:\")([^,]*)(\")",
"groupIndexes": [
2
]
},
"exclusionType": "EXCLUSION_TYPE_EXCLUDE"
}
],
"infoTypes": [
{
"name": "EMAIL_ADDRESS"
},
{
"name": "LOCATION"
},
{
"name": "PERSON_NAME"
}
],
"ruleSet": [
{
"infoTypes": [
{
"name": "LOCATION"
}
],
"rules": [
{
"exclusionRule": {
"excludeInfoTypes": {
"infoTypes": [
{
"name": "CUSTOM_BLOCK"
}
]
},
"matchingType": "MATCHING_TYPE_INVERSE_MATCH"
}
}
]
}
]
},
"deidentifyConfig": {
"infoTypeTransformations": {
"transformations": [
{
"primitiveTransformation": {
"replaceWithInfoTypeConfig": {}
}
}
]
}
}
}
Example output:
"item": {
"value": "{\"postal_code\":\"123ABC\",\"street\":\"Street Name\",\"city\":\"My City\",\"provence\":\"My [LOCATION]\"}"
},
By setting "groupIndexes" to 2 we are indicating that we only want the custom infoType to match the middle (or second) regex group and not allow the :" or " to be part of the match. Also, in this example we mark the custom infoType as EXCLUSION_TYPE_EXCLUDE so that it does not report itself:
"exclusionType": "EXCLUSION_TYPE_EXCLUDE"
If you remove this line, anything matching your infoType could also get redacted. This can be useful for testing though - example output:
"item": {
"value": "{\"postal_code\":\"[CUSTOM_BLOCK]\",\"street\":\"[CUSTOM_BLOCK]\",\"city\":\"[CUSTOM_BLOCK]\",\"provence\":\"[CUSTOM_BLOCK][LOCATION]\"}"
},
...
Hope this helps.

How to convert a file to a String which is accepted in JSON?

I am trying to create gists in Github via REST ASSURED.
To create a gist a need to pass file names and their contents.
Now, the content of the file is something which is being rejected by the API.
Example:
{
"description": "Hello World Examples",
"public": true,
"files": {
"hello_world.rb": {
"content": "class HelloWorld\n def initialize(name)\n #name = name.capitalize\n end\n def sayHi\n puts \"Hello !\"\n end\nend\n\nhello = HelloWorld.new(\"World\")\nhello.sayHi"
},
"hello_world.py": {
"content": "class HelloWorld:\n\n def init(self, name):\n self.name = name.capitalize()\n \n def sayHi(self):\n print \"Hello \" + self.name + \"!\"\n\nhello = HelloWorld(\"world\")\nhello.sayHi()"
},
"hello_world_ruby.txt": {
"content": "Run ruby hello_world.rb to print Hello World"
},
"hello_world_python.txt": {
"content": "Run python hello_world.py to print Hello World"
}
}
This is how the the API wants the JSON to be, I could get this via my code:
{
"description": "Happy World",
"public": true,
"files": {
"sid.java": {
"content": "Ce4z5e22ta"
},
"siddharth.py": {
"content": "def a:
if sidh>kundu:
sid==kundu
else:
kundu==sid
"
}
}
}
So the change in the indentations is causing GitHUb API to fail this with 400 error. Can someone please help?
As pointed out in the comments, JSON does not allow control characters in strings. In the case of line breaks, these were encoded as \n in the example.
You should definitely consider using a proper library to create the JSON rather than handling the raw strings yourself.
Create a POJO which will represent your gist (i.e. object with fields like 'description', 'files' collection. And separate POJO for file containing string fields 'name' and 'content';
Do something like this to convert your gist:
try {
GistFile file new GistFile();// Assuming this is POJO for your file
//Set name and content
Gist gist = new Gist(); //Asuming this is a POJO for your gist
gist.addFile(file);
//Add more files if needed and set other properties
ObjectMapper mapper = new ObjectMapper();
String content = mapper.writeValueAsString(gist);
//Now you have valid JSON string
} catch (Exception e) {
e.printStackTrace();
}
This is for com.fasterxml.jackson.databind.ObjectMapper or use different JSON library
Actually there are GitHub specific libraries which do most of the job for you. Please refer to this question: How to connect to github using Java Program it might be helpful

DB script for changing the model of mongoDB collection [duplicate]

In MongoDB, is it possible to update the value of a field using the value from another field? The equivalent SQL would be something like:
UPDATE Person SET Name = FirstName + ' ' + LastName
And the MongoDB pseudo-code would be:
db.person.update( {}, { $set : { name : firstName + ' ' + lastName } );
The best way to do this is in version 4.2+ which allows using the aggregation pipeline in the update document and the updateOne, updateMany, or update(deprecated in most if not all languages drivers) collection methods.
MongoDB 4.2+
Version 4.2 also introduced the $set pipeline stage operator, which is an alias for $addFields. I will use $set here as it maps with what we are trying to achieve.
db.collection.<update method>(
{},
[
{"$set": {"name": { "$concat": ["$firstName", " ", "$lastName"]}}}
]
)
Note that square brackets in the second argument to the method specify an aggregation pipeline instead of a plain update document because using a simple document will not work correctly.
MongoDB 3.4+
In 3.4+, you can use $addFields and the $out aggregation pipeline operators.
db.collection.aggregate(
[
{ "$addFields": {
"name": { "$concat": [ "$firstName", " ", "$lastName" ] }
}},
{ "$out": <output collection name> }
]
)
Note that this does not update your collection but instead replaces the existing collection or creates a new one. Also, for update operations that require "typecasting", you will need client-side processing, and depending on the operation, you may need to use the find() method instead of the .aggreate() method.
MongoDB 3.2 and 3.0
The way we do this is by $projecting our documents and using the $concat string aggregation operator to return the concatenated string.
You then iterate the cursor and use the $set update operator to add the new field to your documents using bulk operations for maximum efficiency.
Aggregation query:
var cursor = db.collection.aggregate([
{ "$project": {
"name": { "$concat": [ "$firstName", " ", "$lastName" ] }
}}
])
MongoDB 3.2 or newer
You need to use the bulkWrite method.
var requests = [];
cursor.forEach(document => {
requests.push( {
'updateOne': {
'filter': { '_id': document._id },
'update': { '$set': { 'name': document.name } }
}
});
if (requests.length === 500) {
//Execute per 500 operations and re-init
db.collection.bulkWrite(requests);
requests = [];
}
});
if(requests.length > 0) {
db.collection.bulkWrite(requests);
}
MongoDB 2.6 and 3.0
From this version, you need to use the now deprecated Bulk API and its associated methods.
var bulk = db.collection.initializeUnorderedBulkOp();
var count = 0;
cursor.snapshot().forEach(function(document) {
bulk.find({ '_id': document._id }).updateOne( {
'$set': { 'name': document.name }
});
count++;
if(count%500 === 0) {
// Excecute per 500 operations and re-init
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
})
// clean up queues
if(count > 0) {
bulk.execute();
}
MongoDB 2.4
cursor["result"].forEach(function(document) {
db.collection.update(
{ "_id": document._id },
{ "$set": { "name": document.name } }
);
})
You should iterate through. For your specific case:
db.person.find().snapshot().forEach(
function (elem) {
db.person.update(
{
_id: elem._id
},
{
$set: {
name: elem.firstname + ' ' + elem.lastname
}
}
);
}
);
Apparently there is a way to do this efficiently since MongoDB 3.4, see styvane's answer.
Obsolete answer below
You cannot refer to the document itself in an update (yet). You'll need to iterate through the documents and update each document using a function. See this answer for an example, or this one for server-side eval().
For a database with high activity, you may run into issues where your updates affect actively changing records and for this reason I recommend using snapshot()
db.person.find().snapshot().forEach( function (hombre) {
hombre.name = hombre.firstName + ' ' + hombre.lastName;
db.person.save(hombre);
});
http://docs.mongodb.org/manual/reference/method/cursor.snapshot/
Starting Mongo 4.2, db.collection.update() can accept an aggregation pipeline, finally allowing the update/creation of a field based on another field:
// { firstName: "Hello", lastName: "World" }
db.collection.updateMany(
{},
[{ $set: { name: { $concat: [ "$firstName", " ", "$lastName" ] } } }]
)
// { "firstName" : "Hello", "lastName" : "World", "name" : "Hello World" }
The first part {} is the match query, filtering which documents to update (in our case all documents).
The second part [{ $set: { name: { ... } }] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline). $set is a new aggregation operator and an alias of $addFields.
Regarding this answer, the snapshot function is deprecated in version 3.6, according to this update. So, on version 3.6 and above, it is possible to perform the operation this way:
db.person.find().forEach(
function (elem) {
db.person.update(
{
_id: elem._id
},
{
$set: {
name: elem.firstname + ' ' + elem.lastname
}
}
);
}
);
I tried the above solution but I found it unsuitable for large amounts of data. I then discovered the stream feature:
MongoClient.connect("...", function(err, db){
var c = db.collection('yourCollection');
var s = c.find({/* your query */}).stream();
s.on('data', function(doc){
c.update({_id: doc._id}, {$set: {name : doc.firstName + ' ' + doc.lastName}}, function(err, result) { /* result == true? */} }
});
s.on('end', function(){
// stream can end before all your updates do if you have a lot
})
})
update() method takes aggregation pipeline as parameter like
db.collection_name.update(
{
// Query
},
[
// Aggregation pipeline
{ "$set": { "id": "$_id" } }
],
{
// Options
"multi": true // false when a single doc has to be updated
}
)
The field can be set or unset with existing values using the aggregation pipeline.
Note: use $ with field name to specify the field which has to be read.
Here's what we came up with for copying one field to another for ~150_000 records. It took about 6 minutes, but is still significantly less resource intensive than it would have been to instantiate and iterate over the same number of ruby objects.
js_query = %({
$or : [
{
'settings.mobile_notifications' : { $exists : false },
'settings.mobile_admin_notifications' : { $exists : false }
}
]
})
js_for_each = %(function(user) {
if (!user.settings.hasOwnProperty('mobile_notifications')) {
user.settings.mobile_notifications = user.settings.email_notifications;
}
if (!user.settings.hasOwnProperty('mobile_admin_notifications')) {
user.settings.mobile_admin_notifications = user.settings.email_admin_notifications;
}
db.users.save(user);
})
js = "db.users.find(#{js_query}).forEach(#{js_for_each});"
Mongoid::Sessions.default.command('$eval' => js)
With MongoDB version 4.2+, updates are more flexible as it allows the use of aggregation pipeline in its update, updateOne and updateMany. You can now transform your documents using the aggregation operators then update without the need to explicity state the $set command (instead we use $replaceRoot: {newRoot: "$$ROOT"})
Here we use the aggregate query to extract the timestamp from MongoDB's ObjectID "_id" field and update the documents (I am not an expert in SQL but I think SQL does not provide any auto generated ObjectID that has timestamp to it, you would have to automatically create that date)
var collection = "person"
agg_query = [
{
"$addFields" : {
"_last_updated" : {
"$toDate" : "$_id"
}
}
},
{
$replaceRoot: {
newRoot: "$$ROOT"
}
}
]
db.getCollection(collection).updateMany({}, agg_query, {upsert: true})
(I would have posted this as a comment, but couldn't)
For anyone who lands here trying to update one field using another in the document with the c# driver...
I could not figure out how to use any of the UpdateXXX methods and their associated overloads since they take an UpdateDefinition as an argument.
// we want to set Prop1 to Prop2
class Foo { public string Prop1 { get; set; } public string Prop2 { get; set;} }
void Test()
{
var update = new UpdateDefinitionBuilder<Foo>();
update.Set(x => x.Prop1, <new value; no way to get a hold of the object that I can find>)
}
As a workaround, I found that you can use the RunCommand method on an IMongoDatabase (https://docs.mongodb.com/manual/reference/command/update/#dbcmd.update).
var command = new BsonDocument
{
{ "update", "CollectionToUpdate" },
{ "updates", new BsonArray
{
new BsonDocument
{
// Any filter; here the check is if Prop1 does not exist
{ "q", new BsonDocument{ ["Prop1"] = new BsonDocument("$exists", false) }},
// set it to the value of Prop2
{ "u", new BsonArray { new BsonDocument { ["$set"] = new BsonDocument("Prop1", "$Prop2") }}},
{ "multi", true }
}
}
}
};
database.RunCommand<BsonDocument>(command);
MongoDB 4.2+ Golang
result, err := collection.UpdateMany(ctx, bson.M{},
mongo.Pipeline{
bson.D{{"$set",
bson.M{"name": bson.M{"$concat": []string{"$lastName", " ", "$firstName"}}}
}},
)

Returning specific propery from json where condition match

Im using rest-assured and try to access the id's where condition liked = false matches with restassured jsonpath.
{
"data": {
"content": [{
liked=true,
id=7fe9cb9a-51e9-e611-80bb-000c297d31d1
},
{
liked=true,
id=f60a496d-e5d1-e611-80ba-000c297d31d1
},
{
liked=false,
id=4fb4abfb-8ac3-e611-80ba-000c297d31d1
}]
}
}
The following code returns all the object, but i need only the id property list. How can i do that?
List<String> aa = response.get("data.content.findAll { content -> content.isLiked == false }");
Using this solution returns only the required property with condition
ReadContext rx = JsonPath.parse(resp.getBody().asString());
List<String> asdf = rx.read("data.content[?(#.isLiked == false)].id");

Case insensitive sorting in MongoDB

How can I sort a MongoDB collection by a given field, case-insensitively? By default, I get A-Z before a-z.
Update:
As of now mongodb have case insensitive indexes:
Users.find({})
.collation({locale: "en" })
.sort({name: 1})
.exec()
.then(...)
shell:
db.getCollection('users')
.find({})
.collation({'locale':'en'})
.sort({'firstName':1})
Update: This answer is out of date, 3.4 will have case insensitive indexes. Look to the JIRA for more information https://jira.mongodb.org/browse/SERVER-90
Unfortunately MongoDB does not yet have case insensitive indexes: https://jira.mongodb.org/browse/SERVER-90 and the task has been pushed back.
This means the only way to sort case insensitive currently is to actually create a specific "lower cased" field, copying the value (lower cased of course) of the sort field in question and sorting on that instead.
Sorting does work like that in MongoDB but you can do this on the fly with aggregate:
Take the following data:
{ "field" : "BBB" }
{ "field" : "aaa" }
{ "field" : "AAA" }
So with the following statement:
db.collection.aggregate([
{ "$project": {
"field": 1,
"insensitive": { "$toLower": "$field" }
}},
{ "$sort": { "insensitive": 1 } }
])
Would produce results like:
{
"field" : "aaa",
"insensitive" : "aaa"
},
{
"field" : "AAA",
"insensitive" : "aaa"
},
{
"field" : "BBB",
"insensitive" : "bbb"
}
The actual order of insertion would be maintained for any values resulting in the same key when converted.
This has been an issue for quite a long time on MongoDB JIRA, but it is solved now. Take a look at this release notes for detailed documentation. You should use collation.
User.find()
.collation({locale: "en" }) //or whatever collation you want
.sort({name:1})
.exec(function(err, users) {
// use your case insensitive sorted results
});
Adding the code .collation({'locale':'en'}) helped to solve my issue.
As of now (mongodb 4), you can do the following:
mongo shell:
db.getCollection('users')
.find({})
.collation({'locale':'en'})
.sort({'firstName':1});
mongoose:
Users.find({})
.collation({locale: "en" })
.sort({name: 1})
.exec()
.then(...)
Here are supported languages and locales by mongodb.
In Mongoose:-
Customer.find()
.collation({locale: "en" })
.sort({comapany: 1})
Here it is in Java. I mixed no-args and first key-val variants of BasicDBObject just for variety
DBCollection coll = db.getCollection("foo");
List<DBObject> pipe = new ArrayList<DBObject>();
DBObject prjflds = new BasicDBObject();
prjflds.put("field", 1);
prjflds.put("insensitive", new BasicDBObject("$toLower", "$field"));
DBObject project = new BasicDBObject();
project.put("$project", prjflds);
pipe.add(project);
DBObject sort = new BasicDBObject();
sort.put("$sort", new BasicDBObject("insensitive", 1));
pipe.add(sort);
AggregationOutput agg = coll.aggregate(pipe);
for (DBObject result : agg.results()) {
System.out.println(result);
}
If you want to sort and return all data in a document, you can add document: "$$ROOT"
db.collection.aggregate([
{
$project: {
field: 1,
insensitive: { $toLower: "$field" },
document: "$$ROOT"
}
},
{ $sort: { insensitive: 1 } }
]).toArray()
Tried all the above and answers
Consolidating the result
Answer-1:
db.collection.aggregate([
{ "$project": {
"field": 1,
"insensitive": { "$toLower": "$field" }
}},
{ "$sort": { "insensitive": 1 } } ])
Aggregate query converts the field into lower, So performance is low for large data.
Answer-2:
db.collection.find({}).collation({locale: "en"}).sort({"name":1})
By default mongo follows uft-8 encoding(Z has high piriority then a) rules ,So overriding with language-specific rules.
Its fast compare to above query
Look into an official document to customize rules
https://docs.mongodb.com/manual/reference/collation/
We solve this problem with the help of .sort function in JavaScript array
Here is the code
function foo() {
let results = collections.find({
_id: _id
}, {
fields: {
'username': 1,
}
}).fetch();
results.sort((a, b)=>{
var nameA = a.username.toUpperCase();
var nameB = b.username.toUpperCase();
if (nameA nameB) {
return 1;
}
return 0;
});
return results;
}

Categories