Kaitai code writing - java

I recently started kaitai-struct for dealing with arbitrary binary formats. I have created the .ksy file for my data and parsed it to targeted language that is java. Now can anyone point me how to pass the input file that has the data and how to get the data that is parsed as output so that I can write code to manipulate that data to my requirements? Is there any tutorial on how to write code depending on the data we get.
Thanks in advance.

First you have to generate Java classes from the .ksy file using the Kaitai Struct Compiler or the WebIDE. You can find more information how to use the compiler in the Kaitai user guide.
If you use the WebIDE then just simply right-click on your .ksy file and select the Generate parser > Java menu item.
After you have the generated Java code, you can parse a structure directly from a local file like this:
AnExampleClass output = AnExampleClass.fromFile("an_example.data");
// ... manipulate output ...
Or you can parse a structure from a byte array (byte[]):
AnExampleClass output = new AnExampleClass(new KaitaiStream(byteArray));
// ... manipulate output ...
Note that parsing from non-seekable streams (i.e. FileInputStream, BufferedInputStream, etc) is not supported and probably won’t be supported, as a lot of parsing functionality in KS relies on seek support.
You can read the generic documentation how to use the API here and you can find the Java-specific documentation here.

The answer from koczkatamas is outdated.
There are now specific implementations.
The snippet would be
AnExampleClass output = new AnExampleClass(new ByteBufferKaitaiStream(byteArray));
See this issue for more details

Related

There is any way/method to append a File(e.g: jpg,png,pdf) in csv file with Java programming language

We currently doing research on how append files in CSV file with Java programming language.
Therefore, we looking for some input & guidance on this, which will much useful for me.
Your help & guidance will be much appreciated.
Thanks in advance.
No, there is not, but the reason why is not specific to the Java programming language. Rather, it pertains to the nature of CSV in general.
CSV stands for "comma separated values", and it is a way to store database information in plaintext. If you read RFC 4180, you'll see a formal definition outlined there.
Part of the definition of CSV is that it is stored in plain text. That means that only numbers and strings can be stored in CSV files.
Thus, it is impossible to store image files in CSV files, unless you serialize your images into strings. More info on how to do that is here: Serialize image to a StringHow can I convert an image to a base64 string using Java?.
Of course it's possible. All you need is to convert your binary data to a Base 64 string representation.
Apache commons codec does this for you:
Just use:
byte[] binaryData = someMethodToReadYourFileIntoBytes();
String encodedData = Base64.encodeBase64String(binaryData);`
Refer to the javadoc for more information and other options on how to go about it.
Hope it helps.

Generate code from antlr tokens

We are currently working on trying to generate a new code using antlr. We have a grammar file that pretty much can recognize everything. Now, our problem is that we want to be able to create code again using the tokens that we generate to create this new file.
We have a .txt file with our tokens that looks like this:
[#0,0:6=' ',<75>,channel=1,1:0]
[#1,7:20='IDENTIFICATION',<6>,1:7]
[#2,21:21=' ',<75>,channel=1,1:21]
[#3,22:29='DIVISION',<4>,1:22]
[#4,30:30='.',<3>,1:30]
[#5,31:40='\n \t ',<75>,channel=1,1:31]
[#6,41:50='PROGRAM-ID',<16>,2:9]
[#7,51:51='.',<3>,2:19]
[#8,52:52=' ',<75>,channel=1,2:20]
[#9,53:59='testpro',<76>,2:21]
[#10,60:60='.',<3>,2:28]
[#11,61:70='\n \t ',<75>,channel=1,2:29]
[#12,71:76='AUTHOR',<31>,3:9]
[#13,77:77='.',<3>,3:15]
Or is there another way to create the old code using tokens?
Thanks in advance, Viktor
The most straight forward way to make the lexer output portable is to serialize the tokenized output of the lexer for transport and storage. You could equally serialize the entire parser generated parse tree. In either case, you will be capturing the full text of the source input.
The intrinsic complexity of the lexer stream object is a single class. The parse tree object complexity is also quite small, involving just a handful of standard classes. Consequently, the complexity of the serialization & deserialization is almost entirely a linear function of size of the parsed source input.
Google Gson is a simple-to-use, relatively fast Java object serialization library.
If your parser is generating some intermediate representation of the parsed source input, you could directly transport the IR using a defined record serialization library like Google FlatBuffers to save & restore IR model instances.

Parsing ASN.1 binary data with Java

I have binary ASN.1 data objects I need to parse into my Java project. I just want the ASN.1 structure and data as it is parsed for example by the BER viewer:
The ASN.1 parser of BouncyCastle is not able to parse this structure (only returns application specific binary data type).
What ASN.1 library can I use to get such a result? Does anybody has sample code that demonstrates how to parse an ASN.1 object?
BTW: I also tried several free ASN.1 Java compilers but none is able to generate working Java code given may ASN.1 specification.
I have to correct myself - it is possible to read out the data using ASN.1 parser included in BouncyCastle - however the process is not that simple.
If you only want to print the data contained in an ASN.1 structure I recommend you to use the class org.bouncycastle.asn1.util.ASN1Dump. It can be used by the following simple code snippet:
ASN1InputStream bIn = new ASN1InputStream(new ByteArrayInputStream(data));
ASN1Primitive obj = bIn.readObject();
System.out.println(ASN1Dump.dumpAsString(obj));
It prints the structure but not the data - but by copying the ASN1Dump into an own class and modifying it to print out for example OCTET_STRINGS this can be done easily.
Additionally the code in ASN1Dump demonstrates to parse ASN.1 structures. For the example the data used in my question can be parsed one level deeper using the following code:
DERApplicationSpecific app = (DERApplicationSpecific) obj;
ASN1Sequence seq = (ASN1Sequence) app.getObject(BERTags.SEQUENCE);
Enumeration secEnum = seq.getObjects();
while (secEnum.hasMoreElements()) {
ASN1Primitive seqObj = (ASN1Primitive) secEnum.nextElement();
System.out.println(seqObj);
}
Just use "true" to print values
ASN1InputStream ais = new ASN1InputStream(
new FileInputStream(new File("d:/myfile.cdr")));
while (ais.available() > 0) {
ASN1Primitive obj = ais.readObject();
System.out.println(ASN1Dump.dumpAsString(obj, true));
}
ais.close();
It is not clear from your question whether or not you have the ASN.1 specification for the BER you are trying to parse. Please note that without the ASN.1 specification, you can only make partial sense of the data if EXPLICIT TAGS were used in the ASN.1 specification from which it was generated. Some tools, such as the one from OSS Nokalva have a library (jar file) called JIAAPI which allows you to traverse and manipulate BER encodings without prior knowledge of the ASN.1 specification.
If you do have the ASN.1 specification, any ASN.1 Java compiler should be able to handle this.
You can download a free trial of the OSS ASN.1 Tools for Java from http://www.oss.com/asn1/products/asn1-download.html to see if works better for you than the others you unsuccessfully tried.
I need to be able to parse any kind of ASN.1 data in krypt. Although krypt is a Ruby project, you may want to have a look at the JRuby extension - the code for handling ASN.1 parsing/encoding is written entirely in Java and modular enough for easy extraction.
I also made a Java-only version, but it is missing some of the higher-level functionality of the former. But since it's concise, maybe it's a good opportunity to get you started.
If you just want to decode the BER-encoded data, there are numerous parsers out there. Have you tried any? There are even two in the Sun JDK - com.sun.jmx.snmp.BerDecoder and com.sun.jndi.ldap.BerDecoder.

How do I convert a Java Hashtable to an NSDictionary (obj-C)?

At the server end (GAE), I've got a java Hashtable.
At the client end (iPhone), I'm trying to create an NSDictionary.
myHashTable.toString() gets me something that looks darned-close-to-but-not-quite-the-same-as [myDictionary description]. If they were the same, I could write the string to a file and do:
NSDictionary *dict = [NSDictionary dictionaryWithContentsOfFile:tmpFile];
I could write a little parser in obj-C to deal with myHashtable.toString(), but I'm sort-of hoping that there's a shortcut already built into something, somewhere -- I just can't seem to find it.
(So, being a geek, I'll spend far longer searching the web for a shortcut than it would take me to write & debug the parser... ;)
Anyway -- hints?
Thanks!
I would convert the Hashtable into something JSON-like and take it on the iPhone side.
Hashtable.toString() is not ideal, it will have problem with spaces, comma and quotation marks.
For JSON-to-NSDictionary, you can find the json-framework tools under http://www.json.org/
As j-16 SDiZ mentioned, you need to serialize your hashtable. It can be to json, xml or some other format. Once serialized, you need to deserialize them into an NSDictionary. JSON is probably the easiest format to do this with plenty of libraries for both Objective-C and Java. http://json.org has a list of libraries.

Developing a (file) exchange format for java

I want to come up with a binary format for passing data between application instances in a form of POFs (Plain Old Files ;)).
Prerequisites:
should be cross-platform
information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
only sequential access is required
should be a way to check data consistency
should be small and fast
should prevent an average user with archiver + notepad from modifying the data
Currently I'm using DeflaterOutputStream + OutputStreamWriter together with InflaterInputStream + InputStreamReader to save/restore objects serialized with XStream, one object per file. Readers/Writers use UTF8.
Now, need to extend this to support the previously described.
My idea of format:
{serialized to XML object}
{delimiter}
{String file name}{delimiter}{byte[] file data}
{delimiter}
{another String file name}{delimiter}{another byte[] file data}
...
{delimiter}
{delimiter}
{MD5 hash for the entire file}
Does this look sane?
What would you use for a delimiter and how would you determine it?
The right way to calculate MD5 in this case?
What would you suggest to read on the subject?
TIA.
It looks INsane.
why invent a new file format?
why try to prevent only stupid users from changing file?
why use a binary format ( hard to compress ) ?
why use a format that cannot be parsed while being received? (receiver has to receive entire file before being able to act on the file. )
XML is already a serialization format that is compressable. So you are serializing a serialized format.
Would serialization of the model (if you are into MVC) not be another way? I'd prefer to use things in the language (or standard libraries) rather then roll my own if possible. The only issue I can see with that is that the file size may be larger than you want.
1) Does this look sane?
It looks fairly sane. However, if you are going to invent your own format rather than just using Java serialization then you should have a good reason. Do you have any good reasons (they do exist in some cases)? One of the standard reasons for using XStream is to make the result human readable, which a binary format immediately loses. Do you have a good reason for a binary format rather than a human readable one? See this question for why human readable is good (and bad).
Wouldn't it be easier just to put everything in a signed jar. There are already standard Java libraries and tools to do this, and you get compression and verification provided.
2) What would you use for a delimiter and how determine it?
Rather than a delimiter I'd explicitly store the length of each block before the block. It's just as easy, and prevents you having to escape the delimiter if it comes up on its own.
3) The right way to calculate MD5 in this case?
There is example code here which looks sensible.
4) What would you suggest to read on the subject?
On the subject of serialization? I'd read about the Java serialization, JSON, and XStream serialization so I understood the pros and cons of each, especially the benefits of human readable files. I'd also look at a classic file format, for example from Microsoft, to understand possible design decisions from back in the days that every byte mattered, and how these have been extended. For example: The WAV file format.
Let's see this should be pretty straightforward.
Prerequisites:
0. should be cross-platform
1. information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
2. only sequential access is required
3. should be a way to check data consistency
4. should be small and fast
5. should prevent an average user with archiver + notepad from modifying the data
Well guess what, you pretty much have it already, it's built-in the platform already:Object Serialization
If you need to reduce the amount of data sent in the wire and provide a custom serialization ( for instance you can sent only 1,2,3 for a given object without using the attribute name or nothing similar, and read them in the same sequence, ) you can use this somehow "Hidden feature"
If you really need it in "text plain" you can also encode it, it takes almost the same amount of bytes.
For instance this bean:
import java.io.*;
public class SimpleBean implements Serializable {
private String website = "http://stackoverflow.com";
public String toString() {
return website;
}
}
Could be represented like this:
rO0ABXNyAApTaW1wbGVCZWFuPB4W2ZRCqRICAAFMAAd3ZWJzaXRldAASTGphdmEvbGFuZy9TdHJpbmc7eHB0ABhodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20=
See this answer
Additionally, if you need a sounded protocol you can also check to Protobuf, Google's internal exchange format.
You could use a zip (rar / 7z / tar.gz / ...) library. Many exists, most are well tested and it'll likely save you some time.
Possibly not as much fun though.
I agree in that it doesn't really sound like you need a new format, or a binary one.
If you truly want a binary format, why not consider one of these first:
Binary XML (fast infoset, Bnux)
Hessian
google packet buffers
But besides that, many textual formats should work just fine (or perhaps better) too; easier to debug, extensive tool support, compresses to about same size as binary (binary compresses poorly, and information theory suggests that for same effective information, same compression rate is achieved -- and this has been true in my testing).
So perhaps also consider:
Json works well; binary support via base64 (with, say, http://jackson.codehaus.org/)
XML not too bad either; efficient streaming parsers, some with base64 support (http://woodstox.codehaus.org/, "typed access API" under 'org.codehaus.stax2.typed.TypedXMLStreamReader').
So it kind of sounds like you just want to build something of your own. Nothing wrong with that, as a hobby, but if so you need to consider it as such.
It likely is not a requirement for the system you are building.
Perhaps you could explain how this is better than using an existing file format such as JAR.
Most standard files formats of this type just use CRC as its faster to calculate. MD5 is more appropriate if you want to prevent deliberate modification.
Bencode could be the way to go.
Here's an excellent implementation by Daniel Spiewak.
Unfortunately, bencode spec doesn't support utf8 which is a showstopper for me.
Might come to this later but currently xml seems like a better choice (with blobs serialized as a Map).

Categories