I am using aws dynamodb akka persistence API https://github.com/akka/akka-persistence-dynamodb which doesn't have a read journal API like Cassandra (Akka Persistence Query).
I can write journal data to dynamodb the event column is in string java object format my next task is to build CQRS using aws lambda or AWS Java API to read dynamodb, which has to convert the event data to human readble format.
Event Data:-
rO0ABXNyAD9jb20uY2Fwb25lLmJhbmsuYWN0b3JzLlBlcnNpc3RlbnRCYW5rQWNjb3VudCRCYW5rQWNjb3VudENyZWF0ZWQrGoMniq0AywIAAUwAC2JhbmtBY2NvdW50dAA6TGNvbS9jYXBvbmUvYmFuay9hY3RvcnMvUGVyc2lzdGVudEJhbmtBY2NvdW50JEJhbmtBY2NvdW50O3hwc3IAOGNvbS5jYXBvbmUuYmFuay5hY3RvcnMuUGVyc2lzdGVudEJhbmtBY2NvdW50JEJhbmtBY2NvdW5011CikshX3ysCAAREAAdiYWxhbmNlTAAIY3VycmVuY3l0ABJMamF2YS9sYW5nL1N0cmluZztMAAJpZHEAfgAETAAEdXNlcnEAfgAEeHBAj0AAAAAAAHQAA0VVUnQAJDM5M2M2NmRiLTJhYmItNDEwNS04NWUyLWMwZjc3MzExMDNlM3QAB3JjYXJkaW4=
I want to know how to convert the above Java Object string value to human-reable format ? I tried using Java objectinputstream but I think I am doing something wrong.
Scala example:-
val eventData:String = "rO0ABXNyAD9jb20uY2Fwb25lLmJhbmsuYWN0b3JzLlBlcnNpc3RlbnRCYW5rQWNjb3VudCRCYW5rQWNjb3VudENyZWF0ZWQrGoMniq0AywIAAUwAC2JhbmtBY2NvdW50dAA6TGNvbS9jYXBvbmUvYmFuay9hY3RvcnMvUGVyc2lzdGVudEJhbmtBY2NvdW50JEJhbmtBY2NvdW50O3hwc3IAOGNvbS5jYXBvbmUuYmFuay5hY3RvcnMuUGVyc2lzdGVudEJhbmtBY2NvdW50JEJhbmtBY2NvdW5011CikshX3ysCAAREAAdiYWxhbmNlTAAIY3VycmVuY3l0ABJMamF2YS9sYW5nL1N0cmluZztMAAJpZHEAfgAETAAEdXNlcnEAfgAEeHBAj0AAAAAAAHQAA0VVUnQAJDM5M2M2NmRiLTJhYmItNDEwNS04NWUyLWMwZjc3MzExMDNlM3QAB3JjYXJkaW4="
??? (and then what how to convert above string value to human reable format)
Thanks
Sri
ok was able to deserialize the object string data and convert it to json below is an example
object DeserializeData extends App {
import java.io.ByteArrayInputStream
import java.io.InputStream
import java.io.ObjectInputStream
import java.util.Base64
import com.google.gson.Gson
val base64encodedString = "rO0ABXNyAD9jb20uY2Fwb25lLmJhbmsuYWN0b3JzLlBlcnNpc3RlbnRCYW5rQWNjb3VudCRCYW5rQWNjb3VudENyZWF0ZWQrGoMniq0AlM3QAB3JjYXJkaW4="
println("Base64 encoded string :" + base64encodedString)
// Decode
val base64decodedBytes = Base64.getDecoder.decode(base64encodedString)
val in = new ByteArrayInputStream(base64decodedBytes)
val obin = new ObjectInputStream(in)
val `object` = obin.readObject
println("Deserialised data: \n" + `object`.toString)
// You could also try...
println("Object class is " + `object`.getClass.toString)
val json = new Gson();
val resp = json.toJson(`object`)
println(resp)
}
A feature to read aws Dynamodb read journal is now implemented no need of any kind of clunky code https://github.com/akka/akka-persistence-dynamodb/pull/114/files thank you Lightbend
Related
I am receiving messages in protobuf format. I need to convert it to json format fast as all my business logic is written to handle json based POJO objects.
byte[] request = ..; // msg received
// convert to intermediate POJO
AdxOpenRtb.BidRequest bidRequestProto = AdxOpenRtb.BidRequest.parseFrom(request, reg);
// convert intermediate POJO to json string.
// THIS STEP IS VERY SLOW
Printer printer = JsonFormat.printer().printingEnumsAsInts().omittingInsignificantWhitespace();
String jsonBody = printer.print(bidRequestProto);
// convert json string to final POJO format
BidRequest bidRequest = super.parse(jsonBody.getBytes());
Proto object to json conversion step is very slow. Is there any faster approach for it?
can i reuse printer object? is it thread-safe?
Note: This POJO class (AdxOpenRtb.BidRequest & BidRequest) is very complex having many hierarchy and fields but contains similar data with slightly different fields name and data types.
I ran into some performance issues as well and ended up writing the QuickBuffers library. It generates dedicated JSON serialization methods (i.e. no reflection) and should give you a 10-30x speedup. It can be used side-by-side with Google's implementation. The code should look something like this:
// Initialization (objects can be reused if desired)
AdxOpenRtb.BidRequest bidRequestProto = AdxOpenRtb.BidRequest.newInstance();
ProtoSource protoSource = ProtoSource.newArraySource();
JsonSink jsonSink = JsonSink.newInstance().setWriteEnumsAsInts(true);
// Convert Protobuf to JSON
bidRequestProto.clearQuick() // or ::parseFrom if you want a new object
.mergeFrom(protoSource.setInput(request))
.writeTo(jsonSink.clear());
// Use the raw json bytes
RepeatedByte jsonBytes = jsonSink.getBytes();
JsonSinkBenchmark has some sample code for replacing the built-in JSON encoder with more battle-tested Gson/Jackson backends.
Edit: if you're doing this within a single process and are worried about performance, you're better off writing or generating code to convert the Java objects directly. JSON is not a very efficient format to go through.
I end up using MapStruct as suggested by some of you (#M.Deinum).
new code:
byte[] request = ..; // msg received
// convert to intermediate POJO
AdxOpenRtb.BidRequest bidRequestProto = AdxOpenRtb.BidRequest.parseFrom(request, reg);
// direct conversion from protobuf Pojo to my custom Pojo
BidRequest bidRequest = BidRequestMapper.INSTANCE.adxOpenRtbToBidRequest(bidRequestProto);
Code snippet of BidRequestMapper:
#Mapper(
collectionMappingStrategy = CollectionMappingStrategy.ADDER_PREFERRED, nullValueCheckStrategy = NullValueCheckStrategy.ALWAYS,
unmappedSourcePolicy = ReportingPolicy.WARN, unmappedTargetPolicy = ReportingPolicy.WARN)
#DecoratedWith(BidRequestMapperDecorator.class)
public abstract class BidRequestMapper {
public static final BidRequestMapper INSTANCE = Mappers.getMapper(BidRequestMapper.class);
#Mapping(source = "impList", target = "imp")
#Mapping(target = "impOverride", ignore = true)
#Mapping(target = "ext", ignore = true)
public abstract BidRequest adxOpenRtbToBidRequest(AdxOpenRtb.BidRequest adxOpenRtb);
...
...
}
// manage proto extensions
abstract class BidRequestMapperDecorator extends BidRequestMapper {
private final BidRequestMapper delegate;
BidRequestMapperDecorator(BidRequestMapper delegate) {
this.delegate = delegate;
}
#Override
public BidRequest adxOpenRtbToBidRequest(AdxOpenRtb.BidRequest bidRequestProto) {
// Covert protobuf msg to basic bid request object
BidRequest bidRequest = delegate.adxOpenRtbToBidRequest(bidRequestProto);
...
...
}
}
The new approach is 20-50x faster in my local test environment.
It's worth mentioning that MapStruct is an annotation processor which makes it much faster than other similar libraries which use reflection and it also has very good support for customization.
I have Java code to convert a JavaRDD to Dataset and save it to HDFS:
Dataset<User> userDataset = sqlContext.createDataset(userRdd.rdd(), Encoders.bean(User.class));
userDataset.write.json("some_path");
User class is defined in Scala language:
case class User(val name: Name, val address: Seq[Address]) extends Serializable
case class Name(firstName: String, lastName: Option[String])
case class Address(address: String)
Code complies and runs successfully, file is saved to HDFS, while User class in the output file has empty schema:
val users = spark.read.json("some_path")
users.count // 100,000 which is same as "userRdd"
users.printSchema // users: org.apache.spark.sql.DataFrame = []
Why Encoders.bean is not working in this case?
Encoders.bean does not support Scala case class, Encoders.product supports that. Encoders.product takes a TypeTag as parameter while initializing a TypeTag is not possible in Java. I created a Scala object to provide TypeTag:
import scala.reflect.runtime.universe._
object MyTypeTags {
val UserTypeTag: TypeTag[User] = typeTag[User]
}
Then in Java code: Dataset<User> userDataset = sqlContext.createDataset(userRdd.rdd(), Encoders.product(MyTypeTags.UserTypeTag()));
I am new to Scala. I was trying to parse an API response in Scala. The API response is in the format:
{"items":[{"name":"john", "time":"2017-05-11T13:51:34.037232", "topic":"india", "reviewer":{"id":"12345","name":"jack"}},
{"name":"Mary", "time":"2017-05-11T13:20:26.001496", "topic":"math", "reviewer":{"id":"5678","name":"Tom"}}]}
My target is to populate a list of reviewer id's from the JSON response. I tried to create a JSON object from the response by
val jsonObject= parse(jsonResponse.getContentString()).getOrElse(Json.empty)
but couldn't get the reviewer ids from the json object. Even tried to iterate the JSON object, but didn't work.
I am not familiar with circe but here is how you would do it with spray-json
import spray.json._
import DefaultJsonProtocol._
val jsonResponse = """{"items":[{"name":"john", "time":"2017-05-11T13:51:34.037232", "topic":"india", "reviewer":{"id":"12345","name":"jack"}},{"name":"Mary", "time":"2017-05-11T13:20:26.001496", "topic":"math", "reviewer":{"id":"5678","name":"Tom"}}]}"""
Need to define the schema using case classes:
case class Reviewer(id: String, name: String)
case class Item(name: String, time: String, topic: String, reviewer: Reviewer)
case class Items(items: Array[Item])
And their implicit conversion:
implicit val reviewerImp: RootJsonFormat[Reviewer] = jsonFormat2(Reviewer)
implicit val itemConverted: RootJsonFormat[Item] = jsonFormat4(Item)
implicit val itemsConverted: RootJsonFormat[Items] = jsonFormat1(Items)
Then it's very simple, parsing is just this:
val obj = jsonResponse.parseJson.convertTo[Items]
At last, get the ids for the reviewers:
val reviewers = obj.items.map(it => it.reviewer.id)
You mentioned play, so here's how you could do it in Play
case class Reviewer(id:Long, name:String)
object Reviewer { implicit val format = Json.format[Reviewer] }
Once you have those set up you could either
val json:JsValue = Json.toJson(reviewerObject)
val json:JsObject = Json.toJson(reviewerObject).as[JsObject]
val json:String = Json.toJson(reviewerObject).toString // Valid json string
Or
val reviewer:Reviewer = Json.parse(reviewerJsonString).as[Reviewer]
val validates:Boolean = Json.parse(reviewerJsonString).validates[Reviewer]
package sample;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.lang.SerializationUtils;
import sample.ProtoObj.Attachment;
public class Main {
public static void main(String args[]){
POJO pojo = new POJO();
pojo.setContent("content");
List<sample.POJO.Attachment> att = new ArrayList<POJO.Attachment>();
sample.POJO.Attachment attach = pojo.new Attachment();
attach.setName("Attachment Name");
attach.setId("0e068652dbd9");
attach.setSize(1913558);
att.add(attach);
pojo.setAttach(att);
byte[] byyy = SerializationUtils.serialize(pojo);
System.out.println("Size of the POJO ::: "+byyy.length);
ProtoObj tc = new ProtoObj();
List<Attachment> attachList = new ArrayList<ProtoObj.Attachment>();
Attachment attach1 = tc.new Attachment();
attach1.setName("Attachment Name");
attach1.setId("0e068652dbd9");
attach1.setSize(1913558);
attachList.add(attach1);
tc.setContent("content");
tc.setAttach(attachList);
byte[] bhh = tc.getProto(tc);
System.out.println("Size of the PROTO ::: "+bhh.length);
}
}
I have used above program to compute the size of the encoded/Serialized Object using Protobuf and POJO. Both the objects handle same set of the data. But the output shows drastic difference in the size of the object.
Output:
Size of the POJO ::: 336
Size of the PROTO ::: 82
Also I have read the below link to know how google protobuf formats affect the size of the encoded object.
https://developers.google.com/protocol-buffers/docs/encoding
But I'm unable to understand it. Please explain me to understand simply.
Protobuf doesn't send the schema alongside with the data. So both sides need to have the schema in order to deserialise passed data.
Because of that you can optimise and put each field right after another. Something like this:
AttachmentName0e068652dbd91913558
And all this in binary format. This in JSON would look like:
{"name": "AttachmentName", "id": "0e068652dbd9", "size": "1913558"}
As you can see the schema is encoded in the serialised message itself.
I'm not completely aware of Java SerialisationUtils, but I think they pass or encode the schema also and that's why you see this size difference.
I am inserting record into Hazelcast from C Application using Memcached Client Library API's, where record is as follows:
typedef struct _activeClient
{
char ID[25];
int IP;
char aMethod[16];
}activeClient;
Now I am trying reading same record using Hazelcast Java Native API's. Here is my Java program.
IMap < String, MemcacheEntry > mapInst = client.getMap("hz_memcache_ABC_MAP");
System.out.println("Map Size:" + mapInst.size());
String key = new String("70826892122991");
MemcacheEntry tmpValRec = pvrMapIst.get(key);
System.out.println("Key:" + key + "ID:" + tmpValRec.getValue());
Here tmpValRec.getValue() printing record content in single String format. But, I want to retrive each member value from tmpValRec to my own java class object. Here is the class
class ActiveClients
{
String ueID;
int Ip;
String aMethod;
ActiveClients()
{
ueID = "";
Ip = 0;
aMethod = "";
}
}
Pointing me to an example would be great help.
I guess the only option is to parse the string to deserialize your object. I know this is a pain, but I don't see a better alternative. Unless of course you store a blob as a value in memcached where the blob is the serialized content of the class.