I have the following JSON input data:
{
"lib": [
{
"id": "a1",
"type": "push",
"icons": [
{
"iId": "111"
}
],
"id": "a2",
"type": "pull",
"icons": [
{
"iId": "111"
},
{
"iId": "222"
}
]
}
]
I want to get the following Dataset:
id type iId
a1 push 111
a2 pull 111
a2 pull 222
How can I do it?
This is my current code. I use Spark 2.3 and Java 1.8:
ds = spark
.read()
.option("multiLine", true).option("mode", "PERMISSIVE")
.json(jsonFilePath);
ds = ds
.select(org.apache.spark.sql.functions.explode(ds.col("lib.icons")).as("icons"));
However the result is wrong:
+---------------+
| icons|
+---------------+
| [[111]]|
|[[111], [222...|
+---------------+
How can I get the correct Dataset?
UPDATE:
I tries this code, but it generates some extra combinations of id, type and iId that do not exist in the input file.
ds = ds
.withColumn("icons", org.apache.spark.sql.functions.explode(ds.col("lib.icons")))
.withColumn("id", org.apache.spark.sql.functions.explode(ds.col("lib.id")))
.withColumn("type", org.apache.spark.sql.functions.explode(ds.col("lib.type")));
ds = ds.withColumn("its", org.apache.spark.sql.functions.explode(ds.col("icons")));
As already pointed out, the JSON String seems to be malformed. with the updated one, you can use the following to get result you wanted:
import org.apache.spark.sql.functions._
spark.read
.format("json")
.load("in/test.json")
.select(explode($"lib").alias("result"))
.select($"result.id", $"result.type", explode($"result.icons").alias("iId"))
.select($"id", $"type", $"iId.iId")
.show
Your JSON appears to be malformed. Fixing the indenting makes this slightly more apparent:
{
"lib": [
{
"id": "a1",
"type": "push",
"icons": [
{
"iId": "111"
}
],
"id": "a2",
"type": "pull",
"icons": [
{
"iId": "111"
},
{
"iId": "222"
}
]
}
]
Does your code work correctly if you feed it this JSON instead?
{
"lib": [
{
"id": "a1",
"type": "push",
"icons": [
{
"iId": "111"
}
]
},
{
"id": "a2",
"type": "pull",
"icons": [
{
"iId": "111"
},
{
"iId": "222"
}
]
}
]
}
Note the inserted }, { just before "id": "a2" to break the object with duplicate keys into two, and the closing } at the very end which had previously been omitted.
Related
I have a JSON output like this:
{
"items": [
{
"id": "1",
"name": "Anna",
"values": [
{
"code": "Latin",
"grade": 1
},
{
"code": "Maths",
"grade": 5
}
]
},
{
"id": "2",
"name": "Mark",
"values": [
{
"code": "Latin",
"grade": 5
},
{
"code": "Maths",
"grade": 5
}
]
}
]
}
I need to get field values for "name": "Anna". I am getting RestAssured Response and would like to use my beans to do that, but I can also use jsonPath() or jsonObject(), but I don't know how. I searched many topics but did not find anything.
I need to create an IDs lists for all objects that have identical (same value and quantity) parameters. I am looking for a solution that will be more efficient than two nested loops and an if.
Object structure in the dataset:
case class MergedProduct(id: String,
products: List[Product])
case class Product(productUrl: String, productId: String)
Example of data in dataset:
[ {
"id": "ID1",
"products": [
{
"product": {
"productUrl": "SOMEURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID2",
"products": [
{
"product": {
"productUrl": "SOMEURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID3",
"products": [
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
}
],
},
{
"id": "ID4",
"products": [
{
"product": {
"productUrl": "SOMEOTHERURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
}
],
},
{
"id": "ID5",
"products": [
{
"product": {
"productUrl": "NOTDUPLICATEDURL",
"productId": "1"
}
},
{
"product": {
"productUrl": "DIFFERENTURL",
"productId": "1"
}
}
],
}
]
In this example, we have 4 objects that are duplicated, so I would like to get their ID in the corresponding lists.
Example output is List[List[String]]:
List(List("ID1", "ID2"), List("ID3","ID4"))
I am looking for something efficient and readable - the dataset we are talking about has nearly 700 million objects.
As I can remove the listed duplicates from the dataset (it does not affect the database) because the goal is one - logging them exists, so I was thinking about the solution of taking MergedProduct one by one, searching for other MergedProduct with identical Products, getting their ID, logging in they exist and then remove the mentioned MergedProduct ID from the dataset and move on to the next one until I check the whole dataset but in this case I would have to collect it first as a list of MergedProducts and then do all operations - seems like going around
After trying some options and looking for neat solutions- I think this is kinda ok:
private def getDuplicates(mergedProducts: List[MergedProduct]): List[List[String]] = {
val duplicates = mergedProducts.groupBy(_.products.sortBy(_.product.productId)).filter(_._2.size > 1).values.toList
duplicates.map(duplicates => duplicates.map(_.id))
}
Object sample:
[
{
"name": "aaa",
"list": [
{
"key": "val1"
},
{
"key": "val2"
},
{
"key": "val3"
},
{
"key": "val4"
}
]
},
{
"name": "bbb",
"list": [
{
"key": "val2"
},
{
"key": "val4"
},
{
"key": "val6"
},
{
"key": "val8"
}
]
}
]
Query: list.key = val1 or val6
Actual results:
[
{"key":"val1"},
{"key":"val2"},
{"key":"val3"},
{"key":"val4"},
{"key":"val2"},
{"key":"val4"},
{"key":"val6"},
{"key":"val8"}
]
Expected results:
[
{"key":"val1"},
{"key":"val6"}
]
I need to pick all objects in list that equal to criteria.
#Query(value="{$or :{ 'listKey' : ?0},{ 'listKey' : ?1} }", fields="{ 'listKey' : 1}")
public List<Object> findByListKey(String value,String value2); // val1 or val6
Actually, it retrieves all objects of list in case it contains this value.
Any suggestions?
You need to project array documents using $ operator & for that you need to use $elemMatch in query.
Use this query
#Query(value="{ list: {$elemMatch: {$or: [{ 'key': ?0 }, { 'key': ?1 }]}}}", fields="{ 'list.$':1}")
Below is the JSON response, I have used JSONPath get() to retrieve value of totalContractAmout by using below path - subscriptionQuoteResponseDetails.customerQuoteDetails[0].billPlanQuoteDetails[0].serviceList[0].skuList[0].totalContractAmount
But this seems to be hard coded, is there any way I can make it generic using JAVA language.
{
"transactionId": "Transaction123",
"systemId": "AAA",
"userId": "User123",
"resultDate": "2019-11-23T12:52:16.400-06:00",
"resultCode": "100",
"resultMessage": "SUCCESS",
"subscriptionQuoteResponseDetails": {
"quoteDetailsStatusCode": 2,
"customerQuoteDetails": [
{
"customerId": "546789",
"buid": "111",
"billPlanQuoteDetails": [
{
"serviceList": [
{
"serviceType": "/service/",
"skuList": [
{
"skuId": "932125",
"productName": "DummyName",
"quantity": 4,
"totalContractAmount": 1728,
"rateCards": [
{
"productCadence": "M",
"cadenceAmount": 48
},
{
"productCadence": "F",
"cadenceAmount": 1728
}
]
}
]
}
]
}
]
}
]
}
}
I want to query for inner object and select only filtered inner objects from mongoddb document.
Consider below mongodb document.
{
"schools": [
{
"name": "ABC",
"students": [
{
"name": "ABC 1",
"class": 1
},
{
"name": "ABC 2",
"class": 2
},
{
"name": "ABC 3",
"class": 1
}
]
},
{
"name": "XYZ",
"students": [
{
"name": "XYZ 1",
"class": 1
},
{
"name": "XYZ 2",
"class": 2
}
]
}
]
}
I want to select only students in class 1.
expected result json as below.
{
"school": {
"name": "ABC",
"students": [
{
"name": "ABC 1",
"class": 1
},
{
"name": "ABC 3",
"class": 1
}
]
},
"school": {
"name": "XYZ",
"students": [
{
"name": "XYZ 1",
"class": 1
}
]
}
}
Even below result is fine with me.
{
"students": [
{
"name": "ABC 1",
"class": 1
},
{
"name": "ABC 3",
"class": 1
},
{
"name": "XYZ 1",
"class": 1
}
]
}
Please help me to get this done.
Really helpful if can provide mongodb query.
I am using mongodb with spring data in my application.
You can search for mongo db array nested record search. The example code is here. Document is here.
db.your_collection_name.find({'school.student.class':1})
If you only want students do flatmap for your results. Here is document for flatmap in mongodb
Finally I could be able to find the query.
First I have to unwind and apply matching criteria. this was work for me.
db.{mycollection}.aggregate(
[
{ $unwind: '$schools.students'},
{ $match : { "schools.students.class" : 1 } },
{ $project : { "schools.name" : 1, 'schools.students' : 1 } }
]
);