Spark SQL nested arrays and beans support - java

Each hour I got some value updates as a new DataFrame. I have to reduce DataFrames in order to deduplicate entities and to track history of value updates. Because reduce logic is too complex, I'm converting DataFrames to JavaRDD, reducing and than converting JavaRDD back to DataFrame.
The issue is that I have to use nested data structures after reduce.
Question
I've read the inferring the schema using reflaction, but still it's not clear for me:
Does Spark SQL supports only nested arrays of primitives or nested arrays of beans too?
Why does Case 1 code doesn't work while Case 2 works?
Case 1
From the following code I got:
scala.MatchError: History(timestamp=1970-01-01 00:00:00.0,
value=10.0) (of class com.somepackage.History)
So I can conclude that Spark does not support nested array of beans. But see Case 2.
#Data
#NoArgsConstructor
#AllArgsConstructor
public static class Entity implements Serializable {
private Integer id;
private History[] history;
}
#Data
#NoArgsConstructor
#AllArgsConstructor
public static class History implements Serializable {
private Timestamp timestamp;
private Double value;
}
JavaRDD<Entity> rdd = JavaSparkContext
.fromSparkContext(spark().sparkContext())
.parallelize(asList(
new Entity(1, new History[] {
new History(new Timestamp(0L), 10.0)
})
));
spark()
//EXCEPTION HERE!
.createDataFrame(rdd, Entity.class)
.show();
Case 2
On the other hand, the next code works correct with nested arrays of beans:
Dataset<Entity> dataSet = spark()
.read()
.option("multiLine", true).option("mode", "PERMISSIVE")
.schema(fromJson("/data/testSchema.json"))
.json(getAbsoluteFilePath("data/testData.json"))
.as(Encoders.bean(Entity.class));
JavaRDD<Entity> rdd = dataSet
.toJavaRDD()
.mapToPair(o -> tuple(RowFactory.create(o.getId()), o))
.reduceByKey((o1, o2) -> o2)
.values()
.saveAsTextFile("output.json");
-------
private String getAbsoluteFilePath(String relativePath) {
return this
.getClass()
.getClassLoader()
.getResource("")
.getPath() + relativePath;
}
private StructType fromJson(String pathToSchema) {
return (StructType) StructType.fromJson(
new BufferedReader(
new InputStreamReader(
Resources.class.getResourceAsStream(pathToSchema)
)
)
.lines()
.collect(Collectors.joining(System.lineSeparator()))
);
}
testData.json
[
{
"id": 1,
"history": [
{
"timestamp": "2018-10-29 23:11:44.000",
"value": 12.5
}
]
},
{
"id": 1,
"history": [
{
"timestamp": "2018-10-30 14:43:05.000",
"value": 13.2
}
]
}
]
testSchema.json
{
"type": "struct",
"fields": [
{
"name": "id",
"type": "integer",
"nullable": true,
"metadata": {}
},
{
"name": "history",
"type": {
"type": "array",
"elementType": {
"type": "struct",
"fields": [
{
"name": "timestamp",
"type": "timestamp",
"nullable": true,
"metadata": {}
},
{
"name": "value",
"type": "double",
"nullable": true,
"metadata": {}
}
]
},
"containsNull": true
},
"nullable": true,
"metadata": {}
}
]
}

Related

How to add link to parent object in schema

I have a simple dto
#Getter
#Setter
#Schema(title = "TestDto", description = "Test dto")
public class TestDto {
private Integer id;
private String value;
#ArraySchema(schema = #Schema(implementation = TestDto.class))
private List<TestDto> children;
and when i generate schema i see
"testDto": [
{
"id": 0,
"value": "string",
"children":["string"]
}
but i need something like this
"testDto": [
{
"id": 0,
"value": "string",
"children":[
{"id": 0,
"value": "string",
"children":[{}]}]
}
or like this
"testDto": [
{
"id": 0,
"value": "string",
"children":["testDto"]
}
is there any way to do that?

Deserializing complex json with matching objects by id (Jackson)

I have a proprietary API that return a complex JSON like:
{
"store": "store_name",
"address": "store_address",
"department": [
{
"name": "d1",
"type": "t1",
"items": [
"i1",
"i2"
]
},
{
"name": "d2",
"type": "t2",
"items": [
"i3"
]
}
],
"itemDescriptions": [
{
"id": "i1",
"description": "desc1"
},
{
"id": "i2",
"description": "desc2",
"innerItems": [
"i2"
]
},
{
"id": "i3",
"description": "desc3"
}
]
}
Is it possible to deserialize this JSON using Jackson into:
#AllArgsConstructor
class Store {
private final String store;
private final String address;
private final List<Department> departments;
/*some logic*/
}
#AllArgsConstructor
class Department {
private final String name;
private final String type;
private final List<Item> items;
/*some logic*/
}
#AllArgsConstructor
class Item {
private final String id;
private final String description;
private final List<Item> innerItems;
/*some logic*/
}
I tried to find answers, but find only this question without solution.
I know that I can do it in my code (deserialize as it is and create objects from result), but its very memory intensive (I have a lot of json and it can be large).
I know that I can write fully custom deserializer, but in this case, I have to describe the deserialization of each field myself - in case of some changes, I will have to change the deserializer, and not just the class(POJO/DTO).
Is there a way to do this with Jackson (or Gson) or with a minimal (preferably relatively generic) amount of my code?

Deserialize complex JSON to Java, classes nested multiple levels deep

I am trying to make the Json output from Cucumber into a single Java object. This contains objects nested four levels deep, and I am having trouble deserializing it. I am presently using Jackson, but open to suggestions.
Here is my Json code:
{
"line": 1,
"elements": [
{
"line": 3,
"name": "Converteren centimeters naar voeten/inches",
"description": "",
"id": "applicatie-neemt-maten-in-cm-en-converteert-ze-naar-voet/inch,-en-vice-versa;converteren-centimeters-naar-voeten/inches",
"type": "scenario",
"keyword": "Scenario",
"steps": [
{
"result": {
"duration": 476796588,
"status": "passed"
},
"line": 4,
"name": "maak Maten-object aan met invoer in \"centimeters\"",
"match": {
"arguments": [
{
"val": "centimeters",
"offset": 37
}
],
"location": "StepDefinition.maakMatenObjectAanMetInvoerIn(String)"
},
"keyword": "Given "
},
{
"result": {
"duration": 36319,
"status": "passed"
},
"line": 5,
"name": "ik converteer",
"match": {
"location": "StepDefinition.converteerMaten()"
},
"keyword": "When "
},
{
"result": {
"duration": 49138,
"status": "passed"
},
"line": 6,
"name": "uitvoer bevat maat in \"voeten/inches\"",
"match": {
"arguments": [
{
"val": "voeten/inches",
"offset": 23
}
],
"location": "StepDefinition.uitvoerBevatMaatIn(String)"
},
"keyword": "Then "
}
]
},
{
"line": 8,
"name": "Converteren voeten/inches naar centimeters",
"description": "",
"id": "applicatie-neemt-maten-in-cm-en-converteert-ze-naar-voet/inch,-en-vice-versa;converteren-voeten/inches-naar-centimeters",
"type": "scenario",
"keyword": "Scenario",
"steps": [
{
"result": {
"duration": 84175,
"status": "passed"
},
"line": 9,
"name": "maak Maten-object aan met invoer in \"voeten/inches\"",
"match": {
"arguments": [
{
"val": "voeten/inches",
"offset": 37
}
],
"location": "StepDefinition.maakMatenObjectAanMetInvoerIn(String)"
},
"keyword": "Given "
},
{
"result": {
"duration": 23928,
"status": "passed"
},
"line": 10,
"name": "ik converteer",
"match": {
"location": "StepDefinition.converteerMaten()"
},
"keyword": "When "
},
{
"result": {
"duration": 55547,
"status": "passed"
},
"line": 11,
"name": "uitvoer bevat maat in \"centimeters\"",
"match": {
"arguments": [
{
"val": "centimeters",
"offset": 23
}
],
"location": "StepDefinition.uitvoerBevatMaatIn(String)"
},
"keyword": "Then "
}
]
}
],
"name": "Applicatie neemt maten in cm en converteert ze naar voet/inch, en vice versa",
"description": "",
"id": "applicatie-neemt-maten-in-cm-en-converteert-ze-naar-voet/inch,-en-vice-versa",
"keyword": "Feature",
"uri": "sample.feature"
}
I have tried a number of different approaches. First I used nested inner classes, but it appeared you had to make them static, which I feared would not work since I have multiple instances of the same object within one (multiple "element"-objects in the root, for example). Then I tried putting them in separate classes, with Json annotations. Here's where that got me (omitting setters):
public class CucumberUitvoer {
private String name;
private String description;
private String id;
private String keyword;
private String uri;
private int line;
#JsonProperty("elements")
private List<FeatureObject> elements;
public CucumberUitvoer(){}
}
public class FeatureObject {
private String name;
private String description;
private String id;
private String type;
private String keyword;
private int line;
#JsonProperty("steps")
private List<StepObject> steps;
public FeatureObject() {
}
}
public class StepObject {
#JsonProperty("result")
private ResultObject result;
private String name;
private String given;
private String location;
private String keyword;
private int line;
#JsonProperty("match")
private MatchObject match;
public StepObject(){}
}
public class ResultObject {
private int duration;
private String status;
public ResultObject(){}
}
public class MatchObject {
#JsonProperty("arguments")
private List<ArgumentObject> arguments;
private String location;
public MatchObject(){}
}
public class ArgumentObject {
private String val;
private String offset;
public ArgumentObject(){}
}
For clarification, here's a class diagram of how the nesting works.
This solution gives me the following error:
com.fasterxml.jackson.databind.JsonMappingException: Can not deserialize instance of nl.icaprojecten.TestIntegratieQuintor.JSONInterpreter.CucumberUitvoer out of START_ARRAY token
Here is the code doing the actual mapping:
ObjectMapper mapper = new ObjectMapper();
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
CucumberUitvoer obj1 = null;
try {
obj1 = mapper.readValue(json, CucumberUitvoer.class);
} catch (IOException e) {
e.printStackTrace();
}
Is there a quick fix to this approach to make it work, or should I try something entirely different?
Ok I spent some time debugging and trying to figure out what was the problem, and finally was something pretty obvious.
implements Serializable
Thats the line I added to MatchObject and worked.
When we try to deserialize some object first we have to make those classes implements the interface Serializable
I just tried your sample code and oddly, it works.
Can you please double check your imports, if the JSON is coming in as provided and the getters, setters, constructors are actually there?
You can get the idea from this code to deserialize,
public class testCustomDeSerializer extends JsonDeserializer<test> {
public testCustomDeSerializer() {
this(null);
}
public TestCustomDeSerializer(Class t) {
// super(t);
}
#Override
public Test deserialize(JsonParser p, DeserializationContext ctx) throws IOException, JsonProcessingException {
ObjectCodec objectCodec = p.getCodec();
JsonNode node = objectCodec.readTree(p);
ObjectMapper objectMapper = new ObjectMapper();
Test test= new Test();
test.setId(node.get("line").asText());
List<elements> elementList = new ArrayList<>();
JsonNode elementsNode = node.get("elements");
Iterator<JsonNode> slaidsIterator = elementsNode.elements();
while (slaidsIterator.hasNext()) {
Steps steps= new Steps();
JsonNode slaidNode = slaidsIterator.next();
JsonNode stepNode= (JsonNode) slaidNode.get("Steps");
BoundingPoly in = objectMapper.readValue(stepNode.toString(), Steps.class);
elementsNode.setSteps(in);
/// continue
return
}
Hope it helps

Mapping JSONArray in RestTemplate Spring

I am trying to map this JSONArray using Spring RestTemplate:
[{
"Command": "/usr/sbin/sshd -D",
"Created": 1454501297,
"Id": "e00ca61f134090da461a3f39d47fc0cbeda77fbbc0610439d3c16a932686b612",
"Image": "ubuntu:latest",
"Labels": {
},
"Names": [
"/nova-c1896fbd-1309-4da2-8d77-b4fe4c02fa8e"
],
"Ports": [
],
"Status": "Up 2 hours"
}, {
"Command": "/usr/sbin/sshd -D",
"Created": 1450106126,
"Id": "7ffc9dbdd200e2c23adec442abd656ed57306955332697cb7da979f36ebf3b22",
"Image": "ubuntu:latest",
"Labels": {
},
"Names": [
"/nova-93b9ae40-8135-48b7-ac17-12094603b28c"
],
"Ports": [
],
"Status": "Up 2 hours"
}]
Here is ContainersInfo class:
#JsonIgnoreProperties(ignoreUnknown = true)
public class ContainersInfo {
private String Id;
private List<String> Names;
public String getId() {
return Id;
}
public void setId(String id) {
Id = id;
}
public List<String> getNames() {
return Names;
}
public void setNames(List<String> names) {
Names = names;
}
}
However I get null when I want to get the data:
ContainersInfo[] containers = syncRestTemplate.getForObject("http://192.168.1.2:4243/containers/json?all=1", ContainersInfo[].class);
for (int i = 0; i < containers.length; i++)
System.out.println("id:" + containers[i].getId());
The resulting output is as follows:
id:null
id:null
Any idea, what I should do?
Your JSON field names are in pascal case as opposed to camel case (which is usually the case). Set Jackson naming strategy to PascalCaseStrategy, i.e by adding #JsonNaming(PascalCaseStrategy.class) annotation into ContainersInfo class.

minItems=1 and "uniqueItems": true for array type property in mongodb #document

I am using spring data for a REST API. There is a document with following properties
"properties": {
"id": {
"type": "string",
"description": "Object id from the database"
},
"name": {
"type": "string",
"description": "English User name"
},
"continents": {
"type": "array",
"description": "Continents the User exist in",
"minItems": 1,
"items": { "type": "string" },
"uniqueItems": true
}
}
For these properties I am writing a java persistent class as below
#Document(collection = "users")
public class User {
private String id;
private String name;
// what annotation I should I apply here for constraints minItems=1 and "uniqueItems": true ?
private String[] continents;
// getter and setters
}
I want to support "minItems": 1, "uniqueItems": true for continents. Can anyone please help me how to do it in User.java?

Categories