Json to avro conversion - java

I'm converting Json to avro. I have json data in JSONArray. So while converting it into byte array i'm facing the problem.
below is my code:
static byte [] fromJsonToAvro(JSONArray json, String schemastr) throws Exception {
ExcelToJson ejj = new ExcelToJson();
List<String> list = new ArrayList<String>();
if (json != null) {
int len = json.length();
for (int i=0;i<len;i++){
list.add(json.get(i).toString());
}
}
InputStream input = new ByteArrayInputStream(list.getBytes()); //json.toString().getBytes()
DataInputStream din = new DataInputStream(input);
.
.
.//rest of the logic
So how can i do it? How to convert JsonArray object to bytes(i.e., how to use getBytes() method for JsonArray objects). The above code giving an error at list.getBytes() and saying getBytes() is undifined for list.

Avro works at the record level, bound to a schema. I don't think there's such a concept as "convert this JSON fragment to bytes for an Avro field independent of any schema or record".
Assuming the array is part of a larger JSON record, if you're starting with a string of the record, you could do
public static byte[] jsonToAvro(String json, String schemaStr) throws IOException {
InputStream input = null;
DataFileWriter<GenericRecord> writer = null;
Encoder encoder = null;
ByteArrayOutputStream output = null;
try {
Schema schema = new Schema.Parser().parse(schemaStr);
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
input = new ByteArrayInputStream(json.getBytes());
output = new ByteArrayOutputStream();
DataInputStream din = new DataInputStream(input);
writer = new DataFileWriter<GenericRecord>(new GenericDatumWriter<GenericRecord>());
writer.create(schema, output);
Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
GenericRecord datum;
while (true) {
try {
datum = reader.read(null, decoder);
} catch (EOFException eofe) {
break;
}
writer.append(datum);
}
writer.flush();
return output.toByteArray();
} finally {
try { input.close(); } catch (Exception e) { }
}
}

For an on-line json to avro converter check the following URL
http://avro4s-ui.landoop.com
It is using the library avro4s that offers a lot of conversions including json=>avro

This discussion is likely useful:
http://mail-archives.apache.org/mod_mbox/avro-user/201209.mbox/%3CCALEq1Z8s1sfaAVB7YE2rpZ=v3q1V_h7Vm39h0HsOzxJ+qfQRSg#mail.gmail.com%3E
The gist is that there is a special Json schema and you can use JsonReader/Writer to get to and from that. The Json schema you should use is defined here:
https://github.com/apache/avro/blob/trunk/share/schemas/org/apache/avro/data/Json.avsc

Related

How to convert InputStream to JsonArray Object using Java

I'm trying to convert InputStream to JSON Array object but not getting the JSON object properly, please find my inputStream record below:
{"id":4,"productId":9949940,"data":"product data 1","productPrice":"653.90"}
{"id":5,"productId":4940404,"data":"product data 2","productPrice":"94.12"}
I'm getting extra commas for each item and for last record as well - please find the java code below. Can someone please help me to resolve this issue. Appreciated your help in advance. Thanks!
Product.java
public void getProduct() {
String bucketName = "myProductBucket";
String key = "products/product-file";
StringBuilder sb = null;
JSONArray jsonArray = new JSONArray();
try(InputStream inputStream = s3Service.getObjectFromS3(bucketName, key);) {
sb = new StringBuilder();
sb.append("[");
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
String line;
while((line = reader.readLine()) != null) {
sb.append(line).append(",");
}
sb.append("]");
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(sb.toString());
}
Output:
[{"id":4,"productId":9949940,"data":"product data 1","productPrice":"653.90"},,
{"id":5,"productId":4940404,"data":"product data 2","productPrice":"94.12"},]
Expected Output:
[{"id":4,"productId":9949940,"data":"product data 1","productPrice":"653.90"},
{"id":5,"productId":4940404,"data":"product data 2","productPrice":"94.12"}]
AFAIU, this is expected, since Your JSON object is only partially valid.
Although it is not a valid JSON array either, it could be parsed into JSONArray after small modifications (mind the starting and closing brackets and a comma between the objects):
[
{"id":4,"productId":9949940,"data":"product data 1","productPrice":"653.90"},
{"id":5,"productId":4940404,"data":"product data 2","productPrice":"94.12"}
]
Or, alternatively, You could split the input into individual JSON objects by hand and parse them one by one.
As per your Product details file, You are not having valid JSON array objects in file. So, It can not be possible to directly create JSONArray from the file.
What you can do is, Read the product lines one by one and Create JSONObject and convert it to the JSONArray. Please find below piece of code which can help.
Scanner s = new Scanner(new File("filepath")); //Read the file
JSONArray jsonArray = new JSONArray();
while (s.hasNext()){
JSONObject jsonObject = new JSONObject(s.next());
jsonArray.put(jsonObject);
}
//jsonArray can be print by iterating through it.
Here is the code that you can use, the idea is InputStream does not represent a valid JSON so you have to convert it into a valid JSON string using StringBuilder. But first, you need to take care of the JSON which is not valid.
StringBuilder sb;
try(InputStream inputStream = new FileInputStream(new File("Path"))) {
sb = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
String line;
while((line = reader.readLine()) != null) {
sb.append(line);
}
} catch (IOException e) {
e.printStackTrace();
}
JSONArray jsonArray = new JSONArray(sb.toString());
Dependecy
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20211205</version>
</dependency>
Get Stream from file content to List<String>.
FileReader fileReader = null;
try {
File file = new File("path");
InputStream inputStream = new FileInputStream(file);
InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
List<String> list = bufferedReader.lines().collect(Collectors.toList());
JSONArray array = new JSONArray();
for (String s : list) {
array.put(new JSONObject(s));
}
bufferedReader.close();
System.out.println(array);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (Objects.nonNull(null)) {
fileReader.close();
}
}
Dependency:
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20220320</version>
</dependency>

What's the difference between getBytes and serialize with String?

Just as the title says, I can't differ getBytes[] from serialization mechanism with String. Below is a test between getBytes[] and serialization mechanism:
public void testUTF() {
byte[] data = SerializeUtil.serUTFString(str);
System.out.println(data.length);
System.out.println(str.getBytes().length);
}
Here is SerializeUtil:
public static byte[] serUTFString(String data) {
byte[] result = null;
ObjectOutputStream oos = null;
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
try {
oos = new ObjectOutputStream(byteArray);
try {
oos.writeUTF(data);
oos.flush();
result = byteArray.toByteArray();
} finally {
oos.close();
}
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
When I set str to Redis, both can work correctly, but getBytes[] seems more efficient. Since they all return a byte array from String, whats's the difference, is serialization necessary?
String.getBytes() returns a byte array repersenting the string characters in the default encoding. ObjectOutputStream.writeUTF writes the string length then bytes in modified UTF-8 format, see java.io.DataOutput API.

How to binary (de)serialize object into/form string?

I need serialize objects into String and deserialize.
I readed sugestion on stackoverflow and make this code:
class Data implements Serializable {
int x = 5;
int y = 3;
}
public class Test {
public static void main(String[] args) {
Data data = new Data();
String out;
try {
// zapis
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(data);
out = new String(baos.toByteArray());
System.out.println(out);
// odczyt.==========================================
ByteArrayInputStream bais = new ByteArrayInputStream(out.getBytes());
ObjectInputStream ois = new ObjectInputStream(bais);
Data d = (Data) ois.readObject();
System.out.println("d.x = " + d.x);
System.out.println("d.y = " + d.y);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
}
}
but I get error:
java.io.StreamCorruptedException: invalid stream header: EFBFBDEF
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298)
at p.Test.main(Test.java:37)
Why?
I expected:
d.x = 5
d.y = 3
how to do in good way?
Ah. I don't want to write this object in file. I have to have it in string format.
Use
ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray()); instead of
ByteArrayInputStream bais = new ByteArrayInputStream(out.getBytes());, since the String conversion corrupts the data (because of the encoding).
If you really need to store the result in a String, you need a safe way to store arbitrary bytes in a String. One way of doing that is to us Base64-encoding.
A totally different approach would have been to not use the standard Java serialization for this class, but create your own Data to/from String converter.
It is not entirely true to say that conversion to string corrupts the data. Conversion to "UTF-8" does because it is not bijective (some characters are 2 bytes but not all 2 bytes sequences are allowed as character sequences), while "ISO-8859-1" is bijective (1 character of a String is a byte and vice-versa).
Base64 encoding is not very space-efficient compared to this.
This is why I would recommend:
/**
* Serialize any object
* #param obj
* #return
*/
public static String serialize(Object obj) {
try {
ByteArrayOutputStream bo = new ByteArrayOutputStream();
ObjectOutputStream so = new ObjectOutputStream(bo);
so.writeObject(obj);
so.flush();
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
return bo.toString("ISO-8859-1");
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Deserialize any object
* #param str
* #param cls
* #return
*/
public static <T> T deserialize(String str, Class<T> cls) {
// deserialize the object
try {
// This encoding induces a bijection between byte[] and String (unlike UTF-8)
byte b[] = str.getBytes("ISO-8859-1");
ByteArrayInputStream bi = new ByteArrayInputStream(b);
ObjectInputStream si = new ObjectInputStream(bi);
return cls.cast(si.readObject());
} catch (Exception e) {
e.printStackTrace();
}
}

Blob object not working properly even though the class is seralized

I have class which is seralized and does convert a very large amount of data object to blob to save it to database.In the same class there is decode method to convert blob to the actual object.Following is the code for encode and decode of the object.
private byte[] encode(ScheduledReport schedSTDReport)
{
byte[] bytes = null;
try
{
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(bos);
oos.writeObject(schedSTDReport);
oos.flush();
oos.close();
bos.close();
//byte [] data = bos.toByteArray();
//ByteArrayOutputStream baos = new ByteArrayOutputStream();
//GZIPOutputStream out = new GZIPOutputStream(baos);
//XMLEncoder encoder = new XMLEncoder(out);
//encoder.writeObject(schedSTDReport);
//encoder.close();
bytes = bos.toByteArray();
//GZIPOutputStream out = new GZIPOutputStream(bos);
//out.write(bytes);
//bytes = bos.toByteArray();
}
catch (Exception e)
{
_log.error("Exception caught while encoding/zipping Scheduled STDReport", e);
}
decode(bytes);
return bytes;
}
/*
* Decode the report definition blob back to the
* ScheduledReport object.
*/
private ScheduledReport decode(byte[] bytes)
{
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
ScheduledReport sSTDR = null;
try
{
ObjectInputStream ois = new ObjectInputStream(bais);
//GZIPInputStream in = new GZIPInputStream(bais);
//XMLDecoder decoder = new XMLDecoder(in);
sSTDR = (ScheduledReport)ois.readObject();//decoder.readObject();
//decoder.close();
}
catch (Exception e)
{
_log.error("IOException caught while decoding/unzipping Scheduled STDReport", e);
}
return sSTDR;
}
The problem here is whenver I change something else in this class
means any other method,a new class version is created and so the new version the class is unable to decode the originally encoded blob object. The object which I am passing for encode is also seralized object but this problem exists. Any ideas thanks
Yup, Java binary serialization is pretty brittle :(
You can add a static serialVersionUID field to the class so that you can control the version numbers... this should prevent problems due to adding methods. You'll still run into potential issues when fields are added though. See the JavaDocs for Serializable for some more details.
You might want to consider using another serialization format such as Protocol Buffers to give you more control though.
You can implement java.io.Externalizable so that you are able to control what is serialized and expected in deserialization.

How to encode a Map<String,String> as Base64 string?

i like to encode a java map of strings as a single base 64 encoded string. The encoded string will be transmitted to a remote endpoint and maybe manipulated by a not nice person. So the worst thing that should happen are invaild key,value-tuples, but should not bring any other security risks aside.
Example:
Map<String,String> map = ...
String encoded = Base64.encode(map);
// somewhere else
Map<String,String> map = Base64.decode(encoded);
Yes, must be Base64. Not like that or that or any other of these. Is there an existing lightweight solution (Single Utils-Class prefered) out there? Or do i have to create my own?
Anything better than this?
// marshalling
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(map);
oos.close();
String encoded = new String(Base64.encodeBase64(baos.toByteArray()));
// unmarshalling
byte[] decoded = Base64.decodeBase64(encoded.getBytes());
ByteArrayInputStream bais = new ByteArrayInputStream(decoded);
ObjectInputStream ois = new ObjectInputStream(bais);
map = (Map<String,String>) ois.readObject();
ois.close();
Thanks,
my primary requirements are: encoded string should be as short as possible and contain only latin characters or characters from the base64 alphabet (not my call). there are no other reqs.
Use Google Gson to convert Map to JSON. Use GZIPOutputStream to compress the JSON string. Use Apache Commons Codec Base64 or Base64OutputStream to encode the compressed bytes to a Base64 string.
Kickoff example:
public static void main(String[] args) throws IOException {
Map<String, String> map = new HashMap<String, String>();
map.put("key1", "value1");
map.put("key2", "value2");
map.put("key3", "value3");
String serialized = serialize(map);
Map<String, String> deserialized = deserialize(serialized, new TypeToken<Map<String, String>>() {}.getType());
System.out.println(deserialized);
}
public static String serialize(Object object) throws IOException {
ByteArrayOutputStream byteaOut = new ByteArrayOutputStream();
GZIPOutputStream gzipOut = null;
try {
gzipOut = new GZIPOutputStream(new Base64OutputStream(byteaOut));
gzipOut.write(new Gson().toJson(object).getBytes("UTF-8"));
} finally {
if (gzipOut != null) try { gzipOut.close(); } catch (IOException logOrIgnore) {}
}
return new String(byteaOut.toByteArray());
}
public static <T> T deserialize(String string, Type type) throws IOException {
ByteArrayOutputStream byteaOut = new ByteArrayOutputStream();
GZIPInputStream gzipIn = null;
try {
gzipIn = new GZIPInputStream(new Base64InputStream(new ByteArrayInputStream(string.getBytes("UTF-8"))));
for (int data; (data = gzipIn.read()) > -1;) {
byteaOut.write(data);
}
} finally {
if (gzipIn != null) try { gzipIn.close(); } catch (IOException logOrIgnore) {}
}
return new Gson().fromJson(new String(byteaOut.toByteArray()), type);
}
Another possible way would be using JSON which is a very ligthweight lib.
The the encoding then would look like this:
JSONObject jso = new JSONObject( map );
String encoded = new String(Base64.encodeBase64( jso.toString( 4 ).toByteArray()));
Your solution works. The only other approach would be to serialize the map yourself (iterate over the keys and values). That would mean you'd have to make sure you handle all the cases correctly (for example, if you transmit the values as key=value, you must find a way to allow = in the key/value and you must separate the pairs somehow which means you must also allow this separation character in the name, etc).
All in all, it's hard to get right, easy to get wrong and would take a whole lot more code and headache. Plus don't forget that you'd have to write a lot of error handling code in the parser (receiver side).

Categories