Tuple and bags to Json - java

I had 4 files. I joined all the files using Pig and obtained the final output and grouped the data as required. Now that I have my input something like this.
({(9723,(N,N)),({({(11,G),(H,House),(1,1ST),(02/25/2015)}),({(10,L),(H,House),(16,EMPTY),(02/25/2015)})})})
which is my pig output.
I want to convert it into JSON.
My output should look like this.
{
"department": {
"department_id": "9723",
"department_group": {
"flag1": "N",
"flag2": "N"
},
"employee_detail1": {
"employee_type": {
"code": "11",
"name": "G"
},
"employee_level": {
"code": "H",
"name": "House"
},
"employee_dmg": {
"code": "1",
"name": "1st"
},
"DOJ": "02/25/2015"
},
"employee_detail2": {
"employee_type": {
"code": "10",
"name": "L"
},
"employee_level": {
"code": "H",
"name": "House"
},
"employee_dmg": {
"code": "0",
"name": "No"
},
"DOJ": "02/25/2015"
}
}
}
There are 2 bags(meaning 2 employee details).... grouped by emp_id and employee group(tuple with flag1 and flag2)....
Can someone suggest me the best way to convert this into JSON...

You can STORE your data with JsonStorage, it will handle nicely a bag.

Related

How to get json key values by another key value

I have a JSON output like this:
{
"items": [
{
"id": "1",
"name": "Anna",
"values": [
{
"code": "Latin",
"grade": 1
},
{
"code": "Maths",
"grade": 5
}
]
},
{
"id": "2",
"name": "Mark",
"values": [
{
"code": "Latin",
"grade": 5
},
{
"code": "Maths",
"grade": 5
}
]
}
]
}
I need to get field values for "name": "Anna". I am getting RestAssured Response and would like to use my beans to do that, but I can also use jsonPath() or jsonObject(), but I don't know how. I searched many topics but did not find anything.

Scala - Ids lists of objects with duplicated values from spark dataset

I need to create an IDs lists for all objects that have identical (same value and quantity) parameters. I am looking for a solution that will be more efficient than two nested loops and an if.
Object structure in the dataset:
case class MergedProduct(id: String,
products: List[Product])
case class Product(productUrl: String, productId: String)
Example of data in dataset:
[ {
"id": "ID1",
"products": [
{
"product": {
"productUrl": "SOMEURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID2",
"products": [
{
"product": {
"productUrl": "SOMEURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID3",
"products": [
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID4",
"products": [
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
}
],
},
{
"id": "ID5",
"products": [
{
"product": {
"productUrl": "NOTDUPLICATEDURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
}
],
}
]
In this example, we have 4 objects that are duplicated, so I would like to get their ID in the corresponding lists.
Example output is List[List[String]]:
List(List("ID1", "ID2"), List("ID3","ID4"))
I am looking for something efficient and readable - the dataset we are talking about has nearly 700 million objects.
As I can remove the listed duplicates from the dataset (it does not affect the database) because the goal is one - logging them exists, so I was thinking about the solution of taking MergedProduct one by one, searching for other MergedProduct with identical Products, getting their ID, logging in they exist and then remove the mentioned MergedProduct ID from the dataset and move on to the next one until I check the whole dataset but in this case I would have to collect it first as a list of MergedProducts and then do all operations - seems like going around
After trying some options and looking for neat solutions- I think this is kinda ok:
private def getDuplicates(mergedProducts: List[MergedProduct]): List[List[String]] = {
val duplicates = mergedProducts.groupBy(_.products.sortBy(_.product.productId)).filter(_._2.size > 1).values.toList
duplicates.map(duplicates => duplicates.map(_.id))
}

How to count by attribute in JSON?

I have the following JSON:
{
"items": [
{
"id": "1",
"name": "John",
"location": {
"town": {
"id": "10"
},
"address": "600 Fake Street",
},
"creation_date": "2010-01-19",
"last_modified_date": "2017-05-18"
},
{
"id": "2",
"name": "Sarah",
"location": {
"town": {
"id": "10"
},
"address": "76 Evergreen Street",
},
"creation_date": "2010-01-19",
"last_modified_date": "2017-05-18"
},
{
"id": "3",
"name": "Hamed",
"location": {
"town": {
"id": "20"
},
"address": "50 East A Street",
},
"creation_date": "2010-01-19",
"last_modified_date": "2017-05-18"
}
]
}
And I need to get something like this, count how many times each townId appears:
[ { "10": 2 }, {"20": 1 }]
I'm trying to find the most eficient way to do this. Any idea?
Most efficient way is to load the String in a StringBuilder and remove all line breaks and white spaces. Then search for index of "town":{"id":" string (town start index) and then search for the end index (String `"}'). Using the 2 indexes you can extract town ids and count them.
No need to deserialize the JSON into POJO objects:) and extract values by xpath from the POJOs.

Nested JSON parsing using Java

{
"transaction": {
"id": 1,
"empid": "12345",
"details1": {
"name": "xyz",
"age": "30",
"sex": "M",
"Address": {
"Office": "office",
"Home": "Home"
}
},
"abcDetails": "asdf",
"mobile": 123455
},
"details2": {
"id": 2,
"empid": "64848",
"details": {
"name": "eryje",
"age": 3027,
"sex": "M",
"Address": {
"Office": "office",
"Home": "Home"
}
},
"abcDetails": "fhkdl",
"mobile": 389928
}
}
I am getting the data in above format. Here I did split and Iterating the data using loop. First time am getting below formatted data. So in this I want get name and age value and details1.Address.Office value also(keys are not static).
"details1": {
"name": "xyz",
"age": "30",
"sex": "M",
"Address": {
"Office": "office",
"Home": "Home"
}
}
Try using JSONObject keys() to get the key and then iterate each key to get to the dynamic value.
// searchResult refers to the current element in the array "search_result"
JSONObject questionMark = searchResult.getJSONObject("question_mark");
Iterator keys = questionMark.keys();
while(keys.hasNext()) {
// loop to get the dynamic key
String currentDynamicKey = (String)keys.next();
// get the value of the dynamic key
JSONObject currentDynamicValue = questionMark.getJSONObject(currentDynamicKey);
// do something here with the value...
}
Reference : How to parse a dynamic JSON key in a Nested JSON result?

Checking Keys in JSON Records using Java

"transaction": {
"id": 1,
"empid": "12345",
"details1": {
"name": "xyz",
"age": "30",
"sex": "M",
"Address": {
"Office": "office",
"Home": "Home"
}
},
"abcDetails": "asdf",
"mobile": 123455
},
I need to test if JSON record contains more then two keys(details, Address).
Then, I need to pass those key input to this line:
parserValue1 = parserValue.asObject().get("firstKey").asObject().get("secondKey");
Can anyone help me?
Many json parsers have a has("key") or contains("key") accessor.
Otherwise you will have to add a condition to check if get("") returns null, or turn your whole Json object into a map, where you do the same checks.

Categories