Avro: ReflectDatumWriter does not output schema information - java

See the following sample code:
User datum = new User("a123456", "my.email#world.com");
Schema schema = ReflectData.get().getSchema(datum.getClass());
DatumWriter<Object> writer = new ReflectDatumWriter<>(schema);
ByteArrayOutputStream output = new ByteArrayOutputStream();
Encoder encoder = EncoderFactory.get().binaryEncoder(output, null);
writer.write(datum, encoder);
encoder.flush();
byte[] bytes = output.toByteArray();
System.out.println(new String(bytes));
which produces:
a123456$my.email#world.com
I had presumed that all Avro writers would publish the schema information as well as the data, but this does not.
I can successfully get the schema printed if I use the GenericDatumWriter in combination with a DataFileWriter but I wish to use the ReflectDatumWriter as I don't wish to construct a GenericRecord myself (I want the library to do this)
How do I get the schema serialized as well?

I solved this myself, you need to use a DataFileWriter as this contains an entry in the create() method that writes the schema
Solution is to use this in conjunction with a ByteArrayOutputStream:
Schema schema = ReflectData.get().getSchema(User.class);
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);
DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter);
ByteArrayOutputStream output = new ByteArrayOutputStream();
dataFileWriter.create(schema, output);
GenericRecord user = createGenericRecord(schema);
dataFileWriter.append(user);
dataFileWriter.close();
byte[] bytes = output.toByteArray();
System.out.println(new String(bytes));

Related

Convert Avro into Byte and Store Byte data into MySQL

I have an Avro schema file customer.avsc. I already successfully created the Avro object using builder, and I can read the avro object. I am wondering how to convert the customer avro object into Byte and store it in the database. Thanks a lot!
public static void main(String[] args) {
// we can now build a customer in a "safe" way
Customer.Builder customerBuilder = Customer.newBuilder();
customerBuilder.setAge(30);
customerBuilder.setFirstName("Mark");
customerBuilder.setLastName("Simpson");
customerBuilder.setAutomatedEmail(true);
customerBuilder.setHeight(180f);
customerBuilder.setWeight(90f);
Customer customer = customerBuilder.build();
System.out.println(customer);
System.out.println(111111);
// write it out to a file
final DatumWriter<Customer> datumWriter = new SpecificDatumWriter<>(Customer.class);
try (DataFileWriter<Customer> dataFileWriter = new DataFileWriter<>(datumWriter)) {
dataFileWriter.create(customer.getSchema(), new File("customer-specific.avro"));
dataFileWriter.append(customer);
System.out.println("successfully wrote customer-specific.avro");
} catch (IOException e){
e.printStackTrace();
}
I am using BinaryEncoder to solve this problem. In this case, the avro could be converted into Byte and saved into the MySQL database. Then when receiving the data from kafka (byte -> MySQL -> Debezium Connector -> Kafka -> Consumer API), then I can just decode the payload of that byte column into avro / Java object again with the same schema.
Here is the code.
Customer.Builder customerBuilder = Customer.newBuilder();
customerBuilder.setAge(20);
customerBuilder.setFirstName("first");
customerBuilder.setLastName("last");
customerBuilder.setAutomatedEmail(true);
customerBuilder.setHeight(180f);
customerBuilder.setWeight(90f);
Customer customer = customerBuilder.build();
DatumWriter<SpecificRecord> writer = new SpecificDatumWriter<SpecificRecord>(
customer.getSchema());
ByteArrayOutputStream out = new ByteArrayOutputStream();
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(out, null);
writer.write(customer, encoder);
encoder.flush();
out.close();
byte[] serializedBytes = out.toByteArray();
System.out.println("Sending message in bytes : " + serializedBytes);
// //String serializedHex = Hex.encodeHexString(serializedBytes);
// //System.out.println("Serialized Hex String : " + serializedHex);
// KeyedMessage<String, byte[]> message = new KeyedMessage<String, byte[]>("page_views", serializedBytes);
// producer.send(message);
// producer.close();
DatumReader<Customer> userDatumReader = new SpecificDatumReader<Customer>(Customer.class);
Decoder decoder = DecoderFactory.get().binaryDecoder(serializedBytes, null);
SpecificRecord datum = userDatumReader.read(null, decoder);
System.out.println(datum);

Creating objects from Primitive avro schema

Suppose I have a schema in avro like this
{ "type" : "string" }
How should i create object from this schema in java?
I did not found a way to do it directly with the java avro lib
but you can still do this
public static byte[] jsonToAvro(String json, Schema schema) throws IOException {
DatumReader<Object> reader = new GenericDatumReader<>(schema);
GenericDatumWriter<Object> writer = new GenericDatumWriter<>(schema);
ByteArrayOutputStream output = new ByteArrayOutputStream();
Decoder decoder = DecoderFactory.get().jsonDecoder(schema, json);
Encoder encoder = EncoderFactory.get().binaryEncoder(output, null);
Object datum = reader.read(null, decoder);
writer.write(datum, encoder);
encoder.flush();
return output.toByteArray();
}
Schema PRIMITIVE = new Schema.Parser().parse("{ \"type\" : \"string\" }");
byte[] b = jsonToAvro("\"" + mystring + "\"", PRIMITIVE);
from How to avro binary encode my json string to a byte array?

Avro write and read works on one machine and not on other

Here is some Avro code that runs on one machine but fails on the other with an exception.
We are not able to make sure what's wrong here.
Here is the code that is causing the problem.
Class<?> clazz = obj.getClass();
ReflectData rdata = ReflectData.AllowNull.get();
Schema schema = rdata.getSchema(clazz);
ByteArrayOutputStream os = new ByteArrayOutputStream();
Encoder encoder = EncoderFactory.get().binaryEncoder(os, null);
DatumWriter<T> writer = new ReflectDatumWriter<T>(schema, rdata);
writer.write(obj, encoder);
encoder.flush();
byte[] bytes = os.toByteArray();
String binaryString = new String (bytes, "ISO-8859-1");
BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(binaryString.getBytes("ISO-8859-1"), null);
GenericDatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord> (schema);
GenericRecord record = datumReader.read(null, decoder);
Exception is:
org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -32
at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:336)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:437)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:427)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:189)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:187)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:263)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:216)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:173)
Adding Dfile.encoding=UTF-8 in the tomcat params helped us resolve the issue.

Json String to Java Object Avro

I am trying to convert a Json string into a generic Java Object, with an Avro Schema.
Below is my code.
String json = "{\"foo\": 30.1, \"bar\": 60.2}";
String schemaLines = "{\"type\":\"record\",\"name\":\"FooBar\",\"namespace\":\"com.foo.bar\",\"fields\":[{\"name\":\"foo\",\"type\":[\"null\",\"double\"],\"default\":null},{\"name\":\"bar\",\"type\":[\"null\",\"double\"],\"default\":null}]}";
InputStream input = new ByteArrayInputStream(json.getBytes());
DataInputStream din = new DataInputStream(input);
Schema schema = Schema.parse(schemaLines);
Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
DatumReader<Object> reader = new GenericDatumReader<Object>(schema);
Object datum = reader.read(null, decoder);
I get "org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_FLOAT" Exception.
The same code works, if I don't have unions in the schema.
Can someone please explain and give me a solution.
For anyone who uses Avro - 1.8.2, JsonDecoder is not directly instantiable outside the package org.apache.avro.io now. You can use DecoderFactory for it as shown in the following code:
String schemaStr = "<some json schema>";
String genericRecordStr = "<some json record>";
Schema.Parser schemaParser = new Schema.Parser();
Schema schema = schemaParser.parse(schemaStr);
DecoderFactory decoderFactory = new DecoderFactory();
Decoder decoder = decoderFactory.jsonDecoder(schema, genericRecordStr);
DatumReader<GenericData.Record> reader =
new GenericDatumReader<>(schema);
GenericRecord genericRecord = reader.read(null, decoder);
Thanks to Reza. I found this webpage.
It introduces how to convert a Json string into an avro object.
http://rezarahim.blogspot.com/2013/06/import-org_26.html
The key of his code is:
static byte[] fromJsonToAvro(String json, String schemastr) throws Exception {
InputStream input = new ByteArrayInputStream(json.getBytes());
DataInputStream din = new DataInputStream(input);
Schema schema = Schema.parse(schemastr);
Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
DatumReader<Object> reader = new GenericDatumReader<Object>(schema);
Object datum = reader.read(null, decoder);
GenericDatumWriter<Object> w = new GenericDatumWriter<Object>(schema);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
w.write(datum, e);
e.flush();
return outputStream.toByteArray();
}
String json = "{\"username\":\"miguno\",\"tweet\":\"Rock: Nerf paper, scissors is fine.\",\"timestamp\": 1366150681 }";
String schemastr ="{ \"type\" : \"record\", \"name\" : \"twitter_schema\", \"namespace\" : \"com.miguno.avro\", \"fields\" : [ { \"name\" : \"username\", \"type\" : \"string\", \"doc\" : \"Name of the user account on Twitter.com\" }, { \"name\" : \"tweet\", \"type\" : \"string\", \"doc\" : \"The content of the user's Twitter message\" }, { \"name\" : \"timestamp\", \"type\" : \"long\", \"doc\" : \"Unix epoch time in seconds\" } ], \"doc:\" : \"A basic schema for storing Twitter messages\" }";
byte[] avroByteArray = fromJsonToAvro(json,schemastr);
Schema schema = Schema.parse(schemastr);
DatumReader<Genericrecord> reader1 = new GenericDatumReader<Genericrecord>(schema);
Decoder decoder1 = DecoderFactory.get().binaryDecoder(avroByteArray, null);
GenericRecord result = reader1.read(null, decoder1);
With Avro 1.4.1, this works:
private static GenericData.Record parseJson(String json, String schema)
throws IOException {
Schema parsedSchema = Schema.parse(schema);
Decoder decoder = new JsonDecoder(parsedSchema, json);
DatumReader<GenericData.Record> reader =
new GenericDatumReader<>(parsedSchema);
return reader.read(null, decoder);
}
Might need some tweaks for later Avro versions.
As it was already mentioned here in the comments, JSON that is understood by AVRO libs is a bit different from a normal JSON object. Specifically, UNION type is wrapped into a nested object structure: "union_field": {"type": "value"}.
So if you want to convert "normal" JSON to AVRO you'll have to use 3rd-party library. For now at least.
https://github.com/allegro/json-avro-converter - Java project that claims to support unions, not sure about default values.
https://github.com/agolovenko/json-to-avro-converter - this is my project, although written in Scala, still usable from Java. Supports unions, default values, base64 binary data...
Your schema does not match the schema of the json string. You need to have a different schema that does not have a union in the place of the error but a decimal number. Such schema should then be used as a writer schema while you can freely use the other one as the reader schema.
Problem is not the code, but the wrong format of the json
String json = "{"foo": {"double": 30.1}, "bar": {"double": 60.2}}";

Different sizes of file in storing String v/s ByteArray in File android

I am using this approach for storing data in a file from responce of Server.
ByteArrayOutputStream outstream = new ByteArrayOutputStream();
response.getEntity().writeTo(outstream);
byte[] responseBody = outstream.toByteArray();
String data = new String(responseBody);
FileOutputStream out = new FileOutputStream(new File(my_path));
out.write(data.getBytes());
out.flush();
out.close();
It's working fine and my file gets created and size of it is 3786 bytes.
Now consider this ,
ByteArrayOutputStream outstream = new ByteArrayOutputStream();
response.getEntity().writeTo(outstream);
byte[] responseBody = outstream.toByteArray();
FileOutputStream out = new FileOutputStream(new File(my_path));
out.write(responseBody);
out.flush();
out.close();
it gives filesize of 1993 bytes.
Can anybody help me understand this , Does this new String(responseBody) do something to responcebytes like some encoding ?
Any help would be appreciated.
Yes, constructing a String from bytes decodes the bytes according to the current default character encoding (if one is not explicitly specified). Also String.getBytes() does the same in reverse (and may not necessarily produce the same sequence of bytes that was used to create it).
A String holds text. If your data is raw binary data and is intended to be treated as such, you should not be storing it in a String, you should be storing it in a byte[].
There is no need to have String data at all in that first bit, just write the byte[] to the file:
byte[] responseBody = outstream.toByteArray();
String data = new String(responseBody);
...
out.write(data.getBytes());
Can just be:
byte[] responseBody = outstream.toByteArray();
...
out.write(responseBody);

Categories