Thrift: Serialize + Deserialize changes object - java

I have a thrift struct something like this:
struct GeneralContainer {
1: required string identifier;
2: required binary data;
}
The idea is to be able to pass different types of thrift objects on a single "pipe", and still be able to deserialize at the other side correctly.
But serializing a GeneralContainer object, and then deserializing it changes the contents of the data field. I am using the TBinaryProtocol:
TSerializer serializer = new TSerializer(new TBinaryProtocol.Factory());
TDeserializer deserializer = new TDeserializer(new TBinaryProtocol.Factory());
GeneralContainer container = new GeneralContainer();
container.setIdentifier("my-thrift-type");
container.setData(ByteBuffer.wrap(serializer.serialize(myThriftTypeObject)));
byte[] serializedContainer = serializer.serialize(container);
GeneralContainer testContainer = new GeneralContainer();
deserializer.deserialize(testContainer, serializedContainer);
Assert.assertEquals(container, testContainer); // fails
My guess is that some sort of markers are getting messed up when we serialize an object containing binary field using TBinaryProtocol. Is that correct? If yes, what are my options for the protocol? My goal is to minimize the size of resulting serialized byte array.
Thanks,
Aman

Tracked it to a bug in thrift 0.4 serialization. Works fine in thrift 0.8.

Related

How to convert protocol buffers binary to JSON using the descriptor in Java

I have a message with a field of the "Any" well known type which can hold a serialized protobuf message of any type.
I want to convert this field to its json representation.
I know the field names are required, and typically you would need the generated classes loaded in the app for this to work, but I am looking for a way to do it with the descriptors.
First, I parse the descriptors:
FileInputStream descriptorFile = new FileInputStream("/descriptor");
DescriptorProtos.FileDescriptorSet fdp = DescriptorProtos.FileDescriptorSet.parseFrom(descriptorFile);
Then, loop through the contained messages and find the correct one (using the "Any" type's URL, which contains the package and message name. I add this to a TypeRegistry which is used to format the JSON.
JsonFormat.TypeRegistry.Builder typeRegistryBuilder = JsonFormat.TypeRegistry.newBuilder();
String messageNameFromUrl = member.getAny().getTypeUrl().split("/")[1];
for (DescriptorProtos.FileDescriptorProto file : fdp.getFileList()) {
for (DescriptorProtos.DescriptorProto dp : file.getMessageTypeList()) {
if (messageNameFromUrl.equals(String.format("%s.%s", file.getPackage(), dp.getName()))) {
typeRegistryBuilder.add(dp.getDescriptorForType()); //Doesn't work.
typeRegistryBuilder.add(MyConcreteGeneratedClass.getDescriptor()); //Works
System.out.println(JsonFormat.printer().usingTypeRegistry(typeRegistryBuilder.build()).preservingProtoFieldNames().print(member.getAny()));
return;
}
}
}
The problem seems to be that parsing the descriptor gives me access to Descriptors.DescriptorProto objects, but I see no way to get the Descriptors.Descriptor object needed for the type registry. I can access the concrete class's descriptor with getDescriptor() and that works, but I am trying to format the JSON at runtime by accessing a pre-generated descriptor file from outside the app, and so I do not have that concrete class available to call getDescriptor()
What would be even better is if I could use the "Any" field's type URL to resolve the Type object and use that to generate the JSON, since it also appears to have the field numbers and names as required for this process.
Any help is appreciated, thanks!
If you convert a DescriptorProtos.FileDescriptorProto to Descriptors.FileDescriptor, the latter has a getMessageTypes() method that returns List<Descriptor>.
Following is a snippet of Kotlin code taken from an open-source library I'm developing called okgrpc. It's the first of its kind attempt to create a dynamic gRPC client/CLI in Java.
private fun DescriptorProtos.FileDescriptorProto.resolve(
index: Map<String, DescriptorProtos.FileDescriptorProto>,
cache: MutableMap<String, Descriptors.FileDescriptor>
): Descriptors.FileDescriptor {
if (cache.containsKey(this.name)) return cache[this.name]!!
return this.dependencyList
.map { (index[it] ?: error("Unknown dependency: $it")).resolve(index, cache) }
.let {
val fd = Descriptors.FileDescriptor.buildFrom(this, *it.toTypedArray())
cache[fd.name] = fd
fd
}
}

How to intercept Jackson JsonNodes deserialization

In my programm I'm doing:
private static final ObjectMapper MAPPER = new GridJettyObjectMapper();
....
JsonNode node = MAPPER.readTree(content);
My JSON contains a lot of SAME strings and I would like intercept readTree() method and put into TextNodes cached Strings (using WeakHashMap for example).
I hope this will save me a lot of memory. For now my app just OOME and in heap dump I see millions of same Strings in TextNodes.
Any idea how to do this?
After some debug I replaced
JsonNode node = MAPPER.readTree(content);
with
Pojo p = MAPPER.readValue(Pojo.class, new PojoDeserializer());
And implement in PojoDeserializer logic that do not generate many TextNodes.
I used JsonParser streaming API.

Using jedis How to cache Java object

Using Redis Java client Jedis How can I cache Java Object?
you should convert your object as a json string to store it, then read the json and transform it back to your object.
you can use Gson in order to do so.
//store
Gson gson = new Gson();
String json = gson.toJson(myObject);
jedis.set(key,json);
//restore
String json = jedis.get(key);
MyObject object=gson.fromJson(json, MyObject.class);
You can't store objects directly into redis. So convert the object into String and then put it in Redis.
In order to do that your object must be serialized. Convert the object to ByteArray and use some encoding algorithm (ex base64encoding) and convert it as String then store in Redis.
While retrieving reverse the process, convert the String to byte array using decoding algorithm (ex: base64decoding) and the convert it to object.
I would recommend to use more convenient lib to do it: Redisson - it's a Redis based framework for Java.
It has some advantages over Jedis
You don't need to serialize/deserialize object by yourself each time
You don't need to manage connection by yourself
You can work with Redis asynchronously
Redisson does it for you and even more. It supports many popular codecs like Jackson JSON, Avro, Smile, CBOR, MsgPack, Kryo, FST, LZ4, Snappy and JDK Serialization.
RBucket<AnyObject> bucket = redisson.getBucket("anyObject");
// set an object
bucket.set(new AnyObject());
// get an object
AnyObject myObject = bucket.get();

Flume: Avro event deserializer To Elastic Search

I want to take a record created by the AVRO deserializer and send it to ElasticSearch. I realize I have to write custom code to do this.
Using the LITERAL option, I have the JSON schema that is the first step in using the GenericRecord. However, looking throughout the AVRO Java API, I see no way of using GenericRecord for one record. All examples use DataFileReader.
In short, I can't get the fields from the Flume event.
Has anyone done this before?
TIA.
I was able to figure it out. I did the following:
// Get the schema
String strSchema = event.getHeader("flume.avro.schema.literal");
// Get the body
byte[] body = event.getBody();
// Create the avro schema
Schema schema = Schema.Parser.parse(strSchema);
// Get the decoder to use to get the "record" from the event stream in object form
BinaryDecoder decoder = DecoderFactory.binaryDecoder(body, null);
// Get the datum reader
GenericDatumReader reader = new GenericDatumReader(schema);
// Get the Avro record in object form
GenericRecord record = reader.read(null, decoder);
// Now you can iterate over the fields
for (Schema.Field field : schema.getFields()) {
Object value = record.get(field.name());
// Code to add field to JSON to send to ElasticSearch not listed
// ...
} // for (Schema.Field field : schema.getFields()) {
This works well.

Getting an object into a ChannelBuffer

I've written a small http server using Netty by following the example http server and now i'm trying to adapt it to my needs (a small app that should send json). I began by manually encoding my POJOs to json using jackson and then using the StringEncoder to get a ChannelBuffer. Now i'm trying to generalize it slightly by extracting the bit that encodes the POJOs to json by adding a HttpContentEncoder and I've managed to implement that more or less.
The part that i can't figure out is how to set the content on the HttpResponse. It expects a ChannelBuffer but how do i get my object into a ChannelBuffer?
Edit
Say i have a handler with code like below and have a HttpContentEncoder that knows how to serialize SomeSerializableObject. Then how do i get my content (SomeSerializableObject) to the HttpContentEncoder? That's what i'm looking for.
SomeSerializableObject obj = ...
// This won't work becuase the HttpMessage expects a ChannelBuffer
HttpRequest res = ...
res.setContent(obj);
Channel ch = ...
ch.write(res);
After looking into it a bit more though i'm unsure if this is what HttpContentEncoder is meant to do or rather do stuff like compression?
Most object serialization/deserialization libraries use InputStream and OutputStream. You could create a dynamic buffer (or a wrapped buffer for deserialization), wrap it with ChannelBufferOutputStream (or ChannelBufferInputStream) to feed the serialization library. For example:
// Deserialization
HttpMessage m = ...;
ChannelBuffer content = m.getContent();
InputStream in = new ChannelBufferInputStream(content);
Object contentObject = myDeserializer.decode(in);
// Serialization
HttpMessage m = ...;
Object contentObject = ...;
ChannelBuffer content = ChannelBuffers.dynamicBuffer();
OutputStream out = new ChannelBufferOutputStream(content);
mySerializer.encode(contentObject, out);
m.setContent(content);
If the serialization library allows you to use a byte array instead of streams, this can be much simpler using ChannelBuffer.array() and ChannelBuffer.arrayOffset().

Categories