I have to fetch file from ftp server, file format is of binary asn type.
I need them to convert in text file, and parse the relevant data.
I am using jdk1.7. I can also use third party jar but it should be license free.
If someone give me an example, it would be better.
I would like to suggest you using: http://www.bouncycastle.org/java.html
After fetching from ftp server, for a quick check use org.bouncycastle.asn1.util.ASN1Dump class:
ASN1InputStream stream = new ASN1InputStream(new ByteArrayInputStream(data));
ASN1Primitive object = stream.readObject();
System.out.println(ASN1Dump.dumpAsString(object));
This will print the structure of your file.
If you know the structure of your file you gonna need to use a parser like:
ASN1InputStream stream = new ASN1InputStream(new ByteArrayInputStream(data));
DERApplicationSpecific application = (DERApplicationSpecific) stream.readObject();
ASN1Sequence sequence = (ASN1Sequence) application.getObject(BERTags.SEQUENCE);
Enumeration enum = sequence.getObjects();
while (enum.hasMoreElements()) {
ASN1Primitive object = (ASN1Primitive) secEnum.nextElement();
System.out.println(object);
}
By the way, ASN1Primitive is a base ASN.1 object from a byte stream. It has a plenty of types (http://www.borelly.net/cb/docs/javaBC-1.4.8/prov/org/bouncycastle/asn1/ASN1Primitive.html) that you can inherit to and get right type.
Related
The following code is used to serialize the data.
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
BinaryEncoder binaryEncoder =
EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(data.getSchema());
datumWriter.setSchema(data.getSchema());
datumWriter.write(data, binaryEncoder);
binaryEncoder.flush();
byteArrayOutputStream.close();
result = byteArrayOutputStream.toByteArray();
I used the following command
FileUtils.writeByteArrayToFile(new File("D:/sample.avro"), data);
to write avro byte array to a file. But when I try to read the same using
File file = new File("D:/sample.avro");
try {
dataFileReader = new DataFileReader(file, datumReader);
} catch (IOException exp) {
System.out.println(exp);
System.exit(1);
}
it throws exception
java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:89)
What is the problem happening here. I refered two other similar stackoverflow questions this and this but haven't been of much help to me. Can someone help me understand this.
The actual data is encoded in the Avro binary format, but typically what's passed around is more than just the encoded data.
What most people think of an "avro file" is a format that includes the header (which has things like the writer schema) and then the actual data: https://avro.apache.org/docs/current/spec.html#Object+Container+Files. The first four bytes of an avro file should be b"Obj1" or 0x4F626A01. The error you are getting is because the binary you are trying to read as a data file doesn't start with the standard magic bytes.
Another standard format is the single object encoding: https://avro.apache.org/docs/current/spec.html#single_object_encoding. This type of binary format should start with 0xC301.
But if I had to guess, the binary you have could just be the raw serialized data without any sort of header information. Though it's hard to know for sure without knowing how the byte array that you have was created.
You'd need to utilize Avro to write the data as well as read it otherwise the schema isn't written (hence the "Not a data file" message). (see: https://cwiki.apache.org/confluence/display/AVRO/FAQ#FAQ-HowcanIserializedirectlyto/fromabytearray?)
If you're just looking to serialize an object, see: https://mkyong.com/java/how-to-read-and-write-java-object-to-a-file/
I need to convert following to Spark DataFrame in Java with the saving of the structure according to the avro schema. And then I'm going to write it to s3 based on this avro structure.
GenericRecord r = new GenericData.Record(inAvroSchema);
r.put("id", "1");
r.put("cnt", 111);
Schema enumTest =
SchemaBuilder.enumeration("name1")
.namespace("com.name")
.symbols("s1", "s2");
GenericData.EnumSymbol symbol = new GenericData.EnumSymbol(enumTest, "s1");
r.put("type", symbol);
ByteArrayOutputStream bao = new ByteArrayOutputStream();
GenericDatumWriter<GenericRecord> w = new GenericDatumWriter<>(inAvroSchema);
Encoder e = EncoderFactory.get().jsonEncoder(inAvroSchema, bao);
w.write(r, e);
e.flush();
I can create the object based on JSON structure
Object o = reader.read(null, DecoderFactory.get().jsonDecoder(inAvroSchema, new ByteArrayInputStream(bao.toByteArray())));
But maybe there is any way to create DataFrame based on ByteArrayInputStream(bao.toByteArray())?
Thanks
No, you have to use a Data Source to read Avro data.
And it's crutial for Spark to read Avro as files from filesystem, because many optimizations and features depend on it (such as compression and partitioning).
You have to add spark-avro (unless you are above 2.4).
Note that EnumType you are using will be String in Spark's Dataset
Also see this: Spark: Read an inputStream instead of File
Alternatively you can consider deploying a bunch of tasks with SparkContext#parallelize and reading/writing the files explicitly by DatumReader/DatumWriter.
I have a java class already serialized and stored as .ser format file but i want to get this converted in json file (.json format) , this is because serialization seems to be inefficient in terms of appending in direct manner, and further cause corruption of file due streamcorruption errors. Is there a possible efficient way to convert this java serialized file to json format.
You can read the .ser file as an InputStream and map the object received with key/value using Gson and write to .json file
InputStream ins = new ObjectInputStream(new FileInputStream("c:\\student.ser"));
Student student = (Student) ins.readObject();
Gson gson = new Gson();
// convert java object to JSON format,
// and returned as JSON formatted string
String json = gson.toJson(student );
try {
//write converted json data to a file named "file.json"
FileWriter writer = new FileWriter("c:\\file.json");
writer.write(json);
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
There is no standard way to do it in Java and also there is no silver bullet - there are a lot of libraries for this. I prefer jackson https://github.com/FasterXML/jackson
ObjectMapper mapper = new ObjectMapper();
// object == ??? read from *.ser
String s = mapper.writeValueAsString(object);
You can see the list of libraries for JSON serialization/deserialization (for java and not only for java) here http://json.org/
this is because serialization seems to be inefficient in terms of appending in direct manner
Not sure if JSON is the answer for you. Could you share with us some examples of data and what manipulations you do with it?
You can try Google Protocol Buffers as alternative to Java serialization and JSON.
In my answer in topic bellow there is an overview of what GPB is and how to use, so you may check that and see if it suits you:
How to write/read binary files that represent objects?
I am using Apache avro for data serialization. Since, the data has a fixed schema I do not want the schema to be a part of serialized data. In the following example, schema is a part of the avro file "users.avro".
User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
User user2 = new User("Ben", 7, "red");
User user3 = User.newBuilder()
.setName("Charlie")
.setFavoriteColor("blue")
.setFavoriteNumber(null)
.build();
// Serialize user1 and user2 to disk
File file = new File("users.avro");
DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
DataFileWriter<User> dataFileWriter = new DataFileWriter<User (userDatumWriter);
dataFileWriter.create(user1.getSchema(), new File("users.avro"));
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.append(user3);
dataFileWriter.close();
Can anyone please tell me how to store avro-files without schema embedded in it?
Here you find a comprehensive how to in which I explain how to achieve the schema-less serialization using Apache Avro.
A companion test campaign shows up some figures on the performance that you might expect.
The code is on GitHub: example and test classes show up how to use the Data Reader and Writer with a Stub class generated by Avro itself.
Should be doable.
Given an encoder, you can use a DatumWriter to write data directly to a ByteArrayOutputStream (which you can then write to a java.io.File).
Here's how to get started in Scala (from Salat-Avro):
val baos = new ByteArrayOutputStream
val encoder = EncoderFactory.get().binaryEncoder(baos, null)
encoder.write(myRecord, encoder)
I've written a small http server using Netty by following the example http server and now i'm trying to adapt it to my needs (a small app that should send json). I began by manually encoding my POJOs to json using jackson and then using the StringEncoder to get a ChannelBuffer. Now i'm trying to generalize it slightly by extracting the bit that encodes the POJOs to json by adding a HttpContentEncoder and I've managed to implement that more or less.
The part that i can't figure out is how to set the content on the HttpResponse. It expects a ChannelBuffer but how do i get my object into a ChannelBuffer?
Edit
Say i have a handler with code like below and have a HttpContentEncoder that knows how to serialize SomeSerializableObject. Then how do i get my content (SomeSerializableObject) to the HttpContentEncoder? That's what i'm looking for.
SomeSerializableObject obj = ...
// This won't work becuase the HttpMessage expects a ChannelBuffer
HttpRequest res = ...
res.setContent(obj);
Channel ch = ...
ch.write(res);
After looking into it a bit more though i'm unsure if this is what HttpContentEncoder is meant to do or rather do stuff like compression?
Most object serialization/deserialization libraries use InputStream and OutputStream. You could create a dynamic buffer (or a wrapped buffer for deserialization), wrap it with ChannelBufferOutputStream (or ChannelBufferInputStream) to feed the serialization library. For example:
// Deserialization
HttpMessage m = ...;
ChannelBuffer content = m.getContent();
InputStream in = new ChannelBufferInputStream(content);
Object contentObject = myDeserializer.decode(in);
// Serialization
HttpMessage m = ...;
Object contentObject = ...;
ChannelBuffer content = ChannelBuffers.dynamicBuffer();
OutputStream out = new ChannelBufferOutputStream(content);
mySerializer.encode(contentObject, out);
m.setContent(content);
If the serialization library allows you to use a byte array instead of streams, this can be much simpler using ChannelBuffer.array() and ChannelBuffer.arrayOffset().