Convert complex nested Json to Spark Dataframe in JAVA

Convert complex nested Json to Spark Dataframe in JAVA - java

Can anyone help with the Java code to convert the following JSON to Spark Dataframe..
Note : It is not the File
logic :
Listen to kafka topic T1 , read the each record in the RDD and apply additional logic convert the result data to Json Object and write it to another topic T2 in Kafka..
T2 Structure is below.
JSON:
[
{
"#tenant_id":"XYZ",
"alarmUpdateTime":1526342400000,
"alarm_id":"AB5C9123",
"alarm_updates":[
{
"alarmField":"Severity",
"new_value":"Minor",
"old_value":"Major"
},
{
"alarmField":"state",
"new_value":"UPDATE",
"old_value":"NEW"
}
],
"aucID":"5af83",
"inID":"INC15234567",
"index":"test",
"product":"test",
"source":"ABS",
"state":"NEW"
}
]
Classes created :
ClassAlarm{
String #tenant_id;
String alarm_id;
.
.
List <AlarmUpdate> update;
Get and Setter functions for all variables
}
AlarmUpdate{
String alarmField;
String oldVal;
String NewVal;
Get and Setter functions for all variables
}
AppClass{
void static main(){
Alarm alarmObj = new Alarm();
//set values for variables in alarmObj.
Dataset <Row> results = jobCtx.getSparkSession().createDataFrame(Arrays.asList(alarmObj), Alarm.class)
//At this point seeing following errors.
}
}
Error:
2018-05-15 13:40:48 ERROR JobScheduler - Error running job streaming
job 1526406040000 ms.0 scala.MatchError:
com.ca.alarmupdates.AlarmUpdate#48c8809b (of class
com.ca.alarmupdates.AlarmUpdate)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:236)
~[spark-catalyst_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:231)
~[spark-catalyst_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
~[spark-catalyst_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:170)
~[spark-catalyst_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:154)
~[spark-catalyst_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
~[spark-catalyst_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:379)
~[spark-catalyst_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1105)
~[spark-sql_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1105)
~[spark-sql_2.11-2.2.0.jar:2.2.0]
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
~[jaf-sdk-2.4.0.jar:?]
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
~[jaf-sdk-2.4.0.jar:?]
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
~[jaf-sdk-2.4.0.jar:?]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
~[jaf-sdk-2.4.0.jar:?]
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
~[jaf-sdk-2.4.0.jar:?]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
~[jaf-sdk-2.4.0.jar:?]
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1105)
~[spark-sql_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1103)
~[spark-sql_2.11-2.2.0.jar:2.2.0]
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
~[jaf-sdk-2.4.0.jar:?]
at scala.collection.Iterator$class.toStream(Iterator.scala:1322)
~[jaf-sdk-2.4.0.jar:?]
at scala.collection.AbstractIterator.toStream(Iterator.scala:1336)
~[jaf-sdk-2.4.0.jar:?]
at scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298)
~[jaf-sdk-2.4.0.jar:?]
at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336)
~[jaf-sdk-2.4.0.jar:?]
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:406)
~[spark-sql_2.11-2.2.0.jar:2.2.0]
at com.ca.alarmupdates.AlarmUpdates.lambda$null$0(AlarmUpdates.java:85)
~[classes/:?]
at java.util.Arrays$ArrayList.forEach(Arrays.java:3880) ~[?:1.8.0_161]
at com.ca.alarmupdates.AlarmUpdates.lambda$main$f87f782d$1(AlarmUpdates.java:58)
~[classes/:?]
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at scala.util.Try$.apply(Try.scala:192) ~[jaf-sdk-2.4.0.jar:?]
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
~[jaf-sdk-2.4.0.jar:?]
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
~[spark-streaming_2.11-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_161]

You can use wholeTextFiles to read the json file and get the json text and use it in json api of SparkSession as
import org.apache.spark.sql.SparkSession;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
static SparkSession spark = SparkSession.builder().master("local").appName("simple").getOrCreate();
static JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());
Dataset<Row> df = spark.read().json(sc.wholeTextFiles("path to json file").map(t -> t._2()));
df.show(false);
and you should get
+----------+---------------+--------+--------------------------------------------+-----+-----------+-----+-------+------+-----+
|#tenant_id|alarmUpdateTime|alarm_id|alarm_updates |aucID|inID |index|product|source|state|
+----------+---------------+--------+--------------------------------------------+-----+-----------+-----+-------+------+-----+
|XYZ |1526342400000 |AB5C9123|[[Severity,Minor,Major], [state,UPDATE,NEW]]|5af83|INC15234567|test |test |ABS |NEW |
+----------+---------------+--------+--------------------------------------------+-----+-----------+-----+-------+------+-----+
You can use the master and appName as you prefer
Updated
You have commented that
The way you do through file , can we do it with the object . I have to convert to Ingest the data to the other T2
For that lets say you have a record read from T1 topic as string object as
String t1Record = "[\n" +
" {\n" +
" \"#tenant_id\":\"XYZ\",\n" +
" \"alarmUpdateTime\":1526342400000,\n" +
" \"alarm_id\":\"AB5C9123\",\n" +
" \"alarm_updates\":[\n" +
" {\n" +
" \"alarmField\":\"Severity\",\n" +
" \"new_value\":\"Minor\",\n" +
" \"old_value\":\"Major\"\n" +
" },\n" +
" {\n" +
" \"alarmField\":\"state\",\n" +
" \"new_value\":\"UPDATE\",\n" +
" \"old_value\":\"NEW\"\n" +
" }\n" +
" ],\n" +
" \"aucID\":\"5af83\",\n" +
" \"inID\":\"INC15234567\",\n" +
" \"index\":\"test\",\n" +
" \"product\":\"test\",\n" +
" \"source\":\"ABS\",\n" +
" \"state\":\"NEW\"\n" +
" }\n" +
"]";
and you convert it into RDD as
JavaRDD<String> t1RecordRDD = sc.parallelize(Arrays.asList(t1Record));
Then you can apply the json api to convert into dataframe as
Dataset<Row> df = spark.read().json(t1RecordRDD);
which should give you the same result as above

Related

Parse map attribute from <List<Map<String, String>> [duplicate]

This question already has answers here:
How to parse JSON in Java
(36 answers)
Closed 12 days ago.
I recently stumbled on an issue with parsing mapping values which are handed over via a List.
I receive a Json and within the JSON there is an extra field attributes. Which looks like this
"attributes": [
{
"id": "id",
"value": "12345677890124566"
},
{
"id": "Criticality",
"value": "medium"
},
{
"id": "type",
"value": "business"
},
{
"id": "active",
"value": "true"
}
],
I fetch it via parsing it into a List via (List<Map<String, String>>) request.get("attributes") attributes.
I parse through the list via : for (Map<String, String> attribute : attributes).
I am not able to get the value of any attribute. I tried stuff like get("active"), containsKey and much more the only result I get is null.
I tried parsing the value from the mapping for an attribute but received only null instead of the value.

You attributes data is not a map in the Java sense. It is an array of objects each with an id and a value.
This will work:
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonMappingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ArrayNode;
public class AttributeParser {
public static void main(String[] args) throws JsonMappingException, JsonProcessingException {
ObjectMapper objectMapper = new ObjectMapper() ;
String requestJSON = "{ \"attributes\": [\r\n"
+ " {\r\n"
+ " \"id\": \"id\",\r\n"
+ " \"value\": \"12345677890124566\"\r\n"
+ " },\r\n"
+ " {\r\n"
+ " \"id\": \"Criticality\",\r\n"
+ " \"value\": \"medium\"\r\n"
+ " },\r\n"
+ " {\r\n"
+ " \"id\": \"type\",\r\n"
+ " \"value\": \"business\"\r\n"
+ " },\r\n"
+ " {\r\n"
+ " \"id\": \"active\",\r\n"
+ " \"value\": \"true\"\r\n"
+ " }\r\n"
+ " ]}";
JsonNode request = objectMapper.readTree(requestJSON);
ArrayNode attributesNode = (ArrayNode) request.get("attributes");
attributesNode.forEach(jsonNode -> {
System.out.println("id: " + jsonNode.get("id").asText() + ", value: " + jsonNode.get("value").asText());
});
}
}

you try to access to map data with get("active")
but you should use get("value")
i hope this example can help :
for(Map<String, String> attribute: attributes){
String value = attribute.get("value");
system.out.print(value); }

kafka producer throwing error when sending an Avro record to topic:io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: null

I am trying to create a topic and produce an avro record to a kafka topic, but I am facing an issue while send the record to the producer. I am getting an error.
org.apache.kafka.common.errors.SerializationException: Error registering Avro schema: {"type":"record","name":"MyClass","namespace":"com.test.avro","fields":[{"name":"event_type","type":"string"},{"name":"time_stamp","type":"long"},{"name":"context","type":{"type":"record","name":"context","fields":[{"name":"device","type":"string"},{"name":"type","type":"string"}]}}]}
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: null; error code: 0
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:294)
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: null; error code: 0
I am not able to fix the issue.
I tried changing the version of kafka avro seriaizer from 5.3.0 version to 7.1.1
implementation group: 'io.confluent', name: 'kafka-avro-serializer', version: '5.3.0'
Facing the same issue in all the version.
My code was able to create the topic was throwing an error when I was sending/submitting a record to the topic
producer.send(record);
Sharing my code as well.
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.util.Properties;
public class AvroSerialization {
public static void main(String[] args) {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
org.apache.kafka.common.serialization.StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put("schema.registry.url", "http://localhost:8081");
KafkaProducer<Object, Object> producer = new KafkaProducer<>(props);
String key = "key1";
String userSchema = "{\n" +
" \"type\" : \"record\",\n" +
" \"name\" : \"MyClass\",\n" +
" \"namespace\" : \"com.test.avro\",\n" +
" \"fields\" : [ {\n" +
" \"name\" : \"event_type\",\n" +
" \"type\" : \"string\"\n" +
" }, {\n" +
" \"name\" : \"time_stamp\",\n" +
" \"type\" : \"long\"\n" +
" }, {\n" +
" \"name\" : \"context\",\n" +
" \"type\" : {\n" +
" \"type\" : \"record\",\n" +
" \"name\" : \"context\",\n" +
" \"fields\" : [ {\n" +
" \"name\" : \"device\",\n" +
" \"type\" : \"string\"\n" +
" }, {\n" +
" \"name\" : \"type\",\n" +
" \"type\" : \"string\"\n" +
" }]\n" +
" }\n" +
" } ]\n" +
"}";
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(userSchema);
GenericRecord avroRecord = new GenericData.Record(schema);
GenericRecord context = new GenericData.Record(schema.getField("context").schema());
avroRecord.put("event_type", "visit");
avroRecord.put("time_stamp", System.currentTimeMillis());
context.put("device", "ios");
context.put("type","iphone12max");
avroRecord.put("context", context);
ProducerRecord<Object, Object> record = new ProducerRecord<>("avrotopic", avroRecord);
try {
producer.send(record);
} catch (Throwable e) {
e.printStackTrace();
// may need to do something with it
}
// When you're finished producing records, you can flush the producer to ensure it has all been written to Kafka and
// then close the producer to free its resources.
finally {
producer.flush();
producer.close();
}
}
}

Groovy JsonBuilder: using field names that are variable names too

I'm using groovy.json.JsonBuilder and having trouble specifying a fieldname that is also the name of a variable in the current scope.
This works:
System.out.println(new GroovyShell().evaluate(
"def builder = new groovy.json.JsonBuilder();"
+ "def age = 23;"
+ "builder.example {"
+ " name 'Fred';"
+ " 'age1' 27;"
+ " blah {"
+ " foo 'bar';"
+ " };"
+ "};"
+ "return builder.toPrettyString()"));
And produces output:
{
"example": {
"name": "Fred",
"age1": 27,
"blah": {
"foo": "bar"
}
}
}
But this fails (the field 'age' for some reason conflicts with the variable):
System.out.println(new GroovyShell().evaluate(
"def builder = new groovy.json.JsonBuilder();"
+ "def age = 23;"
+ "builder.example {"
+ " name 'Fred';"
+ " 'age' 27;"
+ " blah {"
+ " foo 'bar';"
+ " };"
+ "};"
+ "return builder.toPrettyString()"));
And produces an exception:
groovy.lang.MissingMethodException: No signature of method: java.lang.Integer.call() is applicable for argument types: (Integer) values: [27]
Possible solutions: wait(), abs(), any(), wait(long), wait(long, int), max(int, int)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:72)
at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:48)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:128)
at Script1$_run_closure1.doCall(Script1.groovy:5)
at Script1$_run_closure1.doCall(Script1.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:104)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:326)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:264)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at groovy.lang.Closure.call(Closure.java:421)
at groovy.lang.Closure.call(Closure.java:415)
at groovy.json.JsonDelegate.cloneDelegateAndGetContent(JsonDelegate.java:91)
at groovy.json.JsonBuilder.invokeMethod(JsonBuilder.java:314)
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:47)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:128)
at Script1.run(Script1.groovy:3)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:444)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:482)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:453)
I do not want to change the name of my variable. Is there some way to force JsonBuilder to accept the field name? As you can see, I tried to put it in quotes, but that didn't help.

Use delegate.age in order to refer to the surrounding closure instead of referring to the variable.
System.out.println(new GroovyShell().evaluate(
"def builder = new groovy.json.JsonBuilder();"
+ "def age = 23;"
+ "builder.example {"
+ " name 'Fred';"
+ " delegate.age 27;"
+ " blah {"
+ " foo 'bar';"
+ " };"
+ "};"
+ "return builder.toPrettyString()"));
should give you
{
"example": {
"name": "Fred",
"age": 27,
"blah": {
"foo": "bar"
}
}
}

lift json :Custom serializer for java 8 LocalDateTime throwing mapping exception

I have a class named child2 which i want to serialize and deserialize my class contains a LocalDateTime attribute for which i have to write a custom serializer i tried with the two solutions but both was throwing exceptions
here is my code
Solution 1
case class Child2(var str:String,var Num:Int,MyList:List[Int],val myDate : LocalDateTime = LocalDateTime.now()){
var number:Int=555
}
class Message1SerializerDateTime extends Serializer[LocalDateTime]{
private val LocalDateTimeClass = classOf[LocalDateTime]
def deserialize(implicit format: Formats): PartialFunction[(TypeInfo, JValue), LocalDateTime] = {
case (TypeInfo(LocalDateTimeClass, _), json) => json match {
case JString(dt) => LocalDateTime.parse(dt)
case x => throw new MappingException("Can't convert " + x + " to LocalDateTime")
}
}
def serialize(implicit format: Formats): PartialFunction[Any, JValue] = {
case x: LocalDateTime => JString(x.toString)
}
}
object MessageTest extends App {
implicit val formats = /*Serialization.formats(NoTypeHints)*/DefaultFormats + new FieldSerializer[Child2]+new Message1SerializerDateTime
var c= new Child2("Mary", 5,List(1, 2),LocalDateTime.now())
c.number=1
// println("number"+c.number)
val ser = write(c)
println("Child class converted to string" +ser)
var obj=read[Child2](ser)
println("object of Child is "+obj)
println("str"+obj.str)
println("Num"+obj.Num)
println("MyList"+obj)
println("myDate"+obj.myDate)
println("number"+obj.number)
}
and its throwing mapping exception
Child class converted to string{"number":1,"str":"Mary","Num":5,"MyList":[1,2],"myDate":"2015-07-28T16:45:44.030"}
[error] (run-main-2) net.liftweb.json.MappingException: unknown error
net.liftweb.json.MappingException: unknown error
at net.liftweb.json.Extraction$.extract(Extraction.scala:46)
at net.liftweb.json.JsonAST$JValue.extract(JsonAST.scala:312)
at net.liftweb.json.Serialization$.read(Serialization.scala:58)
at MessageTest$.delayedEndpoint$MessageTest$1(MessageTest.scala:42)
at MessageTest$delayedInit$body.apply(MessageTest.scala:15)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:383)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at MessageTest$.main(MessageTest.scala:15)
at MessageTest.main(MessageTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 49938
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:451)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:431)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:492)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:337)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:100)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:75)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:68)
at net.liftweb.json.Meta$ParanamerReader$.lookupParameterNames(Meta.scala:89)
at net.liftweb.json.Meta$Reflection$.argsInfo$1(Meta.scala:237)
at net.liftweb.json.Meta$Reflection$.constructorArgs(Meta.scala:253)
at net.liftweb.json.Meta$Reflection$$anonfun$constructors$1.apply(Meta.scala:227)
at net.liftweb.json.Meta$Reflection$$anonfun$constructors$1.apply(Meta.scala:227)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at net.liftweb.json.Meta$Reflection$.constructors(Meta.scala:227)
at net.liftweb.json.Meta$.net$liftweb$json$Meta$$constructors$1(Meta.scala:97)
at net.liftweb.json.Meta$.mkConstructor$1(Meta.scala:124)
at net.liftweb.json.Meta$.fieldMapping$1(Meta.scala:151)
at net.liftweb.json.Meta$.net$liftweb$json$Meta$$toArg$1(Meta.scala:155)
at net.liftweb.json.Meta$$anonfun$net$liftweb$json$Meta$$constructors$1$1$$anonfun$apply$1.apply(Meta.scala:99)
at net.liftweb.json.Meta$$anonfun$net$liftweb$json$Meta$$constructors$1$1$$anonfun$apply$1.apply(Meta.scala:98)
at scala.collection.immutable.List.map(List.scala:278)
at net.liftweb.json.Meta$$anonfun$net$liftweb$json$Meta$$constructors$1$1.apply(Meta.scala:98)
at net.liftweb.json.Meta$$anonfun$net$liftweb$json$Meta$$constructors$1$1.apply(Meta.scala:97)
at scala.collection.immutable.List.map(List.scala:274)
at net.liftweb.json.Meta$.net$liftweb$json$Meta$$constructors$1(Meta.scala:97)
at net.liftweb.json.Meta$$anonfun$mappingOf$1.apply(Meta.scala:169)
at net.liftweb.json.Meta$$anonfun$mappingOf$1.apply(Meta.scala:161)
at net.liftweb.json.Meta$Memo.memoize(Meta.scala:199)
at net.liftweb.json.Meta$.mappingOf(Meta.scala:161)
at net.liftweb.json.Extraction$.net$liftweb$json$Extraction$$mkMapping$1(Extraction.scala:194)
at net.liftweb.json.Extraction$.net$liftweb$json$Extraction$$extract0(Extraction.scala:199)
at net.liftweb.json.Extraction$.extract(Extraction.scala:43)
at net.liftweb.json.JsonAST$JValue.extract(JsonAST.scala:312)
at net.liftweb.json.Serialization$.read(Serialization.scala:58)
at MessageTest$.delayedEndpoint$MessageTest$1(MessageTest.scala:42)
at MessageTest$delayedInit$body.apply(MessageTest.scala:15)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:383)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at MessageTest$.main(MessageTest.scala:15)
at MessageTest.main(MessageTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
Solution 2
case class Child2(var str:String,var Num:Int,MyList:List[Int],val myDate : LocalDateTime = LocalDateTime.now()){
var number:Int=555
}
class Message1Serializer extends Serializer[Child2]{
private val IntervalClass = classOf[Child2]
def deserialize(implicit format: Formats): PartialFunction[(TypeInfo, JValue), Child2] = {
case (TypeInfo(IntervalClass, _), json) => json match {
case JObject(
JField("str", JString(str)) :: JField("Num", JInt(num)) ::
JField("MyList", JArray(mylist)) :: (JField("myDate", JString(mydate)) ::
JField("number", JInt(number)) ::Nil)
) => {
val c = Child2(
str, num.intValue(), mylist.map(_.values.toString.toInt), LocalDateTime.parse(mydate)
)
c.number = number.intValue()
c
}
case x => throw new MappingException("Can't convert " + x + " to Interval")
}
}
def serialize(implicit format: Formats): PartialFunction[Any, JValue] = {
case x: Child2 =>
JObject(
JField("str", JString(x.str)) :: JField("Num", JInt(x.Num)) ::
JField("MyList", JArray(x.MyList.map(JInt(_)))) ::
JField("myDate", JString(x.myDate.toString)) ::
JField("number", JInt(x.number)) :: Nil
)
}
}
object MessageTest extends App
{
implicit val formats = /*Serialization.formats(NoTypeHints)*/DefaultFormats+new Message1Serializer
var c= new Child2("Mary", 5,List(1, 2),LocalDateTime.now())
c.number=1
// println("number"+c.number)
val ser = write(c)
println("Child class converted to string" +ser)
var obj=read[Child2](ser)
println("object of Child is "+obj)
println("str"+obj.str)
println("Num"+obj.Num)
println("MyList"+obj)
println("myDate"+obj.myDate)
println("number"+obj.number)
}
its also throwing mapping exception
Child class converted to string{"str":"Mary","Num":5,"MyList":[1,2],"myDate":"2015-07-28T17:09:31.512","number":1}
[error] (run-main-1) net.liftweb.json.MappingException: unknown error
net.liftweb.json.MappingException: unknown error
at net.liftweb.json.Extraction$.extract(Extraction.scala:46)
at net.liftweb.json.JsonAST$JValue.extract(JsonAST.scala:312)
at net.liftweb.json.Serialization$.read(Serialization.scala:58)
at MessageTest$.delayedEndpoint$MessageTest$1(MessageTest.scala:58)
at MessageTest$delayedInit$body.apply(MessageTest.scala:15)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:383)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at MessageTest$.main(MessageTest.scala:15)
at MessageTest.main(MessageTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 49938
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:451)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:431)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:492)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:337)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:100)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:75)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:68)
at net.liftweb.json.Meta$ParanamerReader$.lookupParameterNames(Meta.scala:89)
at net.liftweb.json.Meta$Reflection$.argsInfo$1(Meta.scala:237)
at net.liftweb.json.Meta$Reflection$.constructorArgs(Meta.scala:253)
at net.liftweb.json.Meta$Reflection$$anonfun$constructors$1.apply(Meta.scala:227)
at net.liftweb.json.Meta$Reflection$$anonfun$constructors$1.apply(Meta.scala:227)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at net.liftweb.json.Meta$Reflection$.constructors(Meta.scala:227)
at net.liftweb.json.Meta$.net$liftweb$json$Meta$$constructors$1(Meta.scala:97)
at net.liftweb.json.Meta$.mkConstructor$1(Meta.scala:124)
at net.liftweb.json.Meta$.fieldMapping$1(Meta.scala:151)
at net.liftweb.json.Meta$.net$liftweb$json$Meta$$toArg$1(Meta.scala:155)
at net.liftweb.json.Meta$$anonfun$net$liftweb$json$Meta$$constructors$1$1$$anonfun$apply$1.apply(Meta.scala:99)
at net.liftweb.json.Meta$$anonfun$net$liftweb$json$Meta$$constructors$1$1$$anonfun$apply$1.apply(Meta.scala:98)
at scala.collection.immutable.List.map(List.scala:278)
at net.liftweb.json.Meta$$anonfun$net$liftweb$json$Meta$$constructors$1$1.apply(Meta.scala:98)
at net.liftweb.json.Meta$$anonfun$net$liftweb$json$Meta$$constructors$1$1.apply(Meta.scala:97)
at scala.collection.immutable.List.map(List.scala:274)
at net.liftweb.json.Meta$.net$liftweb$json$Meta$$constructors$1(Meta.scala:97)
at net.liftweb.json.Meta$$anonfun$mappingOf$1.apply(Meta.scala:169)
at net.liftweb.json.Meta$$anonfun$mappingOf$1.apply(Meta.scala:161)
at net.liftweb.json.Meta$Memo.memoize(Meta.scala:199)
at net.liftweb.json.Meta$.mappingOf(Meta.scala:161)
at net.liftweb.json.Extraction$.net$liftweb$json$Extraction$$mkMapping$1(Extraction.scala:194)
at net.liftweb.json.Extraction$.net$liftweb$json$Extraction$$extract0(Extraction.scala:199)
at net.liftweb.json.Extraction$.extract(Extraction.scala:43)
at net.liftweb.json.JsonAST$JValue.extract(JsonAST.scala:312)
at net.liftweb.json.Serialization$.read(Serialization.scala:58)
at MessageTest$.delayedEndpoint$MessageTest$1(MessageTest.scala:58)
at MessageTest$delayedInit$body.apply(MessageTest.scala:15)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:383)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at MessageTest$.main(MessageTest.scala:15)
at MessageTest.main(MessageTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
please help me how can i resolve this issue .How should i write serializer for java 8 LocalDateTime

I can vaguely remember having a similar issue and I solved it
by parsing with corresponding DateTimeFormat Class
LocalDateTime.parse(dt, DateTimeFormatter.ofPattern("yyyy-MM-ddTHH:mm"))
Correct the pattern for your case

Troubles creating an index with custom analyzer using Jest

Jest provides a brilliant async API for elasticsearch, we find it very usefull. However, sometimes it turns out that resulting requests are slightly different than what we would expect.
Usually we didn't care, since everything was working fine, but in this case it was not.
I want to create an index with a custom ngram analyzer. When I do this following the elasticsearch rest API docs, I call below:
curl -XPUT 'localhost:9200/test' --data '
{
"settings": {
"number_of_shards": 3,
"analysis": {
"filter": {
"keyword_search": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 15
}
},
"analyzer": {
"keyword": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"keyword_search"
]
}
}
}
}
}'
and then I confirm the analyzer is configured properly using:
curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens
in response I receive multiple tokens like exp, expe, expec and so on.
Now using Jest client I put the config json to a file on my classpath, the content is exactly the same as the body of the PUT request above. I execute the Jest action constructed like this:
new CreateIndex.Builder(name)
.settings(
ImmutableSettings.builder()
.loadFromClasspath(
"settings.json"
).build().getAsMap()
).build();
In result
Primo - checked with tcpdump that what's actually posted to elasticsearch is (pretty printed):
{
"settings.analysis.filter.keyword_search.max_gram": "15",
"settings.analysis.filter.keyword_search.min_gram": "3",
"settings.analysis.analyzer.keyword.tokenizer": "whitespace",
"settings.analysis.filter.keyword_search.type": "edge_ngram",
"settings.number_of_shards": "3",
"settings.analysis.analyzer.keyword.filter.0": "lowercase",
"settings.analysis.analyzer.keyword.filter.1": "keyword_search",
"settings.analysis.analyzer.keyword.type": "custom"
}
Secundo - the resulting index settings is:
{
"test": {
"settings": {
"index": {
"settings": {
"analysis": {
"filter": {
"keyword_search": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "15"
}
},
"analyzer": {
"keyword": {
"filter": [
"lowercase",
"keyword_search"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
},
"number_of_shards": "3" <-- the only difference from the one created with rest call
},
"number_of_shards": "3",
"number_of_replicas": "0",
"version": {"created": "1030499"},
"uuid": "Glqf6FMuTWG5EH2jarVRWA"
}
}
}
}
Tertio - checking the analyzer with curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens I get just one token!
Question 1. What is the reason that Jest does not post my original settings json, but some processed one instead?
Question 2. Why the settings generated by Jest are not working?

Glad you found Jest useful, please see my answer below.
Question 1. What is the reason that Jest does not post my original
settings json, but some processed one instead?
It's not Jest but the Elasticsearch's ImmutableSettings doing that, see:
Map test = ImmutableSettings.builder()
.loadFromSource("{\n" +
" \"settings\": {\n" +
" \"number_of_shards\": 3,\n" +
" \"analysis\": {\n" +
" \"filter\": {\n" +
" \"keyword_search\": {\n" +
" \"type\": \"edge_ngram\",\n" +
" \"min_gram\": 3,\n" +
" \"max_gram\": 15\n" +
" }\n" +
" },\n" +
" \"analyzer\": {\n" +
" \"keyword\": {\n" +
" \"type\": \"custom\",\n" +
" \"tokenizer\": \"whitespace\",\n" +
" \"filter\": [\n" +
" \"lowercase\",\n" +
" \"keyword_search\"\n" +
" ]\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
"}").build().getAsMap();
System.out.println("test = " + test);
outputs:
test = {
settings.analysis.filter.keyword_search.type=edge_ngram,
settings.number_of_shards=3,
settings.analysis.analyzer.keyword.filter.0=lowercase,
settings.analysis.analyzer.keyword.filter.1=keyword_search,
settings.analysis.analyzer.keyword.type=custom,
settings.analysis.analyzer.keyword.tokenizer=whitespace,
settings.analysis.filter.keyword_search.max_gram=15,
settings.analysis.filter.keyword_search.min_gram=3
}
Question 2. Why the settings generated by Jest are not working?
Because your usage of settings JSON/map is not the intended case. I have created this test to reproduce your case (it's a bit long but bear with me):
#Test
public void createIndexTemp() throws IOException {
String index = "so_q_26949195";
String settingsAsString = "{\n" +
" \"settings\": {\n" +
" \"number_of_shards\": 3,\n" +
" \"analysis\": {\n" +
" \"filter\": {\n" +
" \"keyword_search\": {\n" +
" \"type\": \"edge_ngram\",\n" +
" \"min_gram\": 3,\n" +
" \"max_gram\": 15\n" +
" }\n" +
" },\n" +
" \"analyzer\": {\n" +
" \"keyword\": {\n" +
" \"type\": \"custom\",\n" +
" \"tokenizer\": \"whitespace\",\n" +
" \"filter\": [\n" +
" \"lowercase\",\n" +
" \"keyword_search\"\n" +
" ]\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
"}";
Map settingsAsMap = ImmutableSettings.builder()
.loadFromSource(settingsAsString).build().getAsMap();
CreateIndex createIndex = new CreateIndex.Builder(index)
.settings(settingsAsString)
.build();
JestResult result = client.execute(createIndex);
assertTrue(result.getErrorMessage(), result.isSucceeded());
GetSettings getSettings = new GetSettings.Builder().addIndex(index).build();
result = client.execute(getSettings);
assertTrue(result.getErrorMessage(), result.isSucceeded());
System.out.println("SETTINGS SENT AS STRING settingsResponse = " + result.getJsonString());
Analyze analyze = new Analyze.Builder()
.index(index)
.analyzer("keyword")
.source("Expecting many tokens")
.build();
result = client.execute(analyze);
assertTrue(result.getErrorMessage(), result.isSucceeded());
Integer actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1);
analyze = new Analyze.Builder()
.analyzer("keyword")
.source("Expecting single token")
.build();
result = client.execute(analyze);
assertTrue(result.getErrorMessage(), result.isSucceeded());
actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
assertTrue("Expected single token but got " + actualTokens, actualTokens == 1);
admin().indices().delete(new DeleteIndexRequest(index)).actionGet();
createIndex = new CreateIndex.Builder(index)
.settings(settingsAsMap)
.build();
result = client.execute(createIndex);
assertTrue(result.getErrorMessage(), result.isSucceeded());
getSettings = new GetSettings.Builder().addIndex(index).build();
result = client.execute(getSettings);
assertTrue(result.getErrorMessage(), result.isSucceeded());
System.out.println("SETTINGS AS MAP settingsResponse = " + result.getJsonString());
analyze = new Analyze.Builder()
.index(index)
.analyzer("keyword")
.source("Expecting many tokens")
.build();
result = client.execute(analyze);
assertTrue(result.getErrorMessage(), result.isSucceeded());
actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1);
}
When you run it you'll see that the case where settingsAsMap is used the actual settings is totally wrong (settings includes another settings which is your JSON but they should have been merged) and so the analyze fails.
Why is this not the intended usage?
Simply because that's how Elasticsearch behaves in this situation. If the settings data is flattened (as it is done by default by the ImmutableSettings class) then it should not have the top level element settings but it can have the same top level element if data is not flattened (and that's why the test case with settingsAsString works).
tl;dr:
Your settings JSON should not include the top level "settings" element (if you run it through ImmutableSettings).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert complex nested Json to Spark Dataframe in JAVA - java

Related

Parse map attribute from <List<Map<String, String>> [duplicate]

kafka producer throwing error when sending an Avro record to topic:io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: null

Groovy JsonBuilder: using field names that are variable names too

lift json :Custom serializer for java 8 LocalDateTime throwing mapping exception

Troubles creating an index with custom analyzer using Jest

Categories

Resources