Parse byte array as HTTP object - java

In Java, how would I convert a byte array (TCP packet payload from a pcap file) into some kind of HTTP object that I can use to get HTTP headers and content body?

One of the stupid lovely things about Java is a total lack of unsigned types. So, a good place to start would be taking your byte array and converting it into a short array to make sure that you don't have any rollover problems. (16 bits versus 8 bits per number).
From there, you could use a BufferedOutputStream to write your data to a file and parse it with one of the Java built-in XML readers, such as JaxB or DOM. BufferedOutputStream writes hex directly to a file, and can take an input of an int, byte, or short array. After you write it out, using the OutputStream it should be very simple to parse the HTML out of it.
If you need any help with any of these individual steps, I'd be happy to help.
EDIT: as maerics has pointed out, perhaps I didn't grasp what you were asking. Regardless, writing your byte array with a BufferedOutputStream is the way to go in my opinion, and I could still help you build a parser if you want.

JNetPcap can do exactly this.
Here are examples for
Opening a pcap file
Parsing http (in the example, we extract an image)
Drawback: parsing http in this library is depracated*, but that doesn't mean it doesn't work
*I can't post anymore links without more reputation. Sorry. You can Google for "jnetpcap http deprecated"

Related

IIB - convert BLOB to String using Java Compute Node

So I have a simple message flow with a File Read node, parsing a .txt (saying whatever) to BLOB, which I have to convert to a string in a Java Compute Node. Never used JAVA, how do I go about this?
Then I have to give the string a new value (whatever) and switch the logical tree body element to the new value.
Should be simple, but still a steep learning curve for me, out of nowhere. All helps are appreciated. :)
When parsing to BLOB, you end up with a byte array in assembly.getMessage().getRootElement().getLastChild().getLastChild(), and converting that to String should be easy:
String(byte[] bytes, Charset charset)
You can get the charset from the Preperties subtree.
You can read about accessing the message tree parts here:
https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.0/com.ibm.etools.mft.doc/ac30330_.htm
Just in case, one other way to do it would be to parse the input file with a dedicated parser directly (DFDL, ...). If one day your document is not in the format you expected, it will throw a proper error and won't crash on the java line trying to cast as a string something that is not a string. It might be too complex for your case (and also useless), but if you are learning, I would recommand you to play with the parsers so you won't have to learn about it for futur cases.
But reading as a BLOB is totally fine as long as you keep in minds that it means litteraly anything could be read, so the java solution is totally fine as long as you handle it properly (try/catch/throw).

java.io.StreamCorruptedException with ruby sender and java client

I have a ruby program that writes data to a socket with sock.write, and I'm reading the data with ObjectInputStream in a java file. I'm getting an invalid header error that translate to the first few characters of my stream.
I've read that if you use ObjectInputStream you must write with ObjectOutputStream, but since the writing file is in ruby im not sure how to accomplish this.
As you say, ObjectInputStream assumes that the bytes it's receiving have been formatted by an ObjectOutputStream. That is, it is expecting the incoming bytes to be a specific representation of a Java primitive or object.
Your Ruby code is unlikely to format bytes in such a way.
You need to define exactly the byte format of the message passing from the Ruby to the Java process. You could tell us more about that message format, but it's likely you will need to use Java's ByteArrayInputStream (https://docs.oracle.com/javase/7/docs/api/java/io/ByteArrayInputStream.html). The data will come into the Java program as a raw array of bytes, and you will need to parse/unpack/process these bytes into whatever objects are appropriate.
Unless performance is critical, you'd probably be best off using JSON or YAML as the intermediate format. They would make it simple to send simple objects such as strings, arrays, and hashes (maps).

Why to avoid using ByteStream much in Java

We shouldn't use byte Stream as Sun Doc says -
actually it represents a kind of low-level I/O that you should avoid.
What is actually low-level I/O and what is exact problem using byte stream.
So the Java docs say:
CopyBytes seems like a normal program, but it actually represents a
kind of low-level I/O that you should avoid. Since xanadu.txt contains
character data, the best approach is to use character streams, as
discussed in the next section. There are also streams for more
complicated data types. Byte streams should only be used for the most
primitive I/O.
The byte streams give you access to the file as it is. Just the bytes. No interpration of any kind. That means no character set conversion, no handling of ints or floats in binary or ascii representation, no dealing with byte orders, or any of that. The higher level streams provide some of these.
Of course a program that copies a file is actually a pretty good example of something that needs a raw byte stream, because it doesn't need or want to do any kind of intepretation of the data; it just wants to copy it verbatim.
So what the really mean is, use byte streams if you think you need them, but be sure you know what you are doing :)
The suggestion is in the context of reading a text file that is discussed in the tutorial. For that purpose it is better to use character streams to handle character set translation properly:
The Java platform stores character values using Unicode conventions.
Character stream I/O automatically translates this internal format to
and from the local character set.
A program that uses character streams in place of byte streams
automatically adapts to the local character set and is ready for
internationalization — all without extra effort by the programmer.

Java and Binary data in the context of sockets

Java newbie here. Are there any helper functions to serialize data in and out of byte arrays? I am writing a Java package that implements a network protocol. So I have to write some typical variables like a version (1byte), sequence Number (long) and binary data (bytes) in a loop. How do I do this in Java? Coming from C I am thinking of creating a byte array of the required size and then since there is no memcpy() I am converting the long into a temporary byte array and then copying it into the actual byte array. It seems so inefficient and also really error prone. Is there a class I could use to marshall and unmarshall parameters to a byte array?
Also why does all the Socket classes only deals with char[] and not byte[]? A socket by definition has to deal with binary data also. How is this done in Java?
I am sure what I am missing is the Java mindset. Appreciate it if some one can point it to me.
EDIT: I did look at DataOutputStream and DataInputStream but I cannot convert the bytes to a String not to a byte[] which means the information might be lost in the conversion to write to a socket.
Pav
Have a look at DataInputStream, DataOutputStream, ObjectInputStream and ObjectOutputStream. Check first if the layout of the data is acceptable to you. Also, Serialization.
Sockets neither deal with char[] nor with byte[] but with InputStream and OutputStream which are used to read and write bytes.
If you are sending the data over a socket, then you don't need a temporary byte array at all; you can wrap the socket's OutputStream with DataOutputStream or ObjectOutputStream and just write what you want to write.
There might be an aspect I've missed that means you do actually need temporary byte arrays. If so, look at ByteArrayOutputStream. Also, there's no memcpy(), sure, but there is System.arraycopy.
As above, DataInputStream and DataOutputStream are exactly what you are looking for. Re your comment about String, if you're planning to use Java Strings over the wire, you're not designing a network protocol, youre designing a Java protocol. There are readUTF() and writeUTF() if you're sure the other end is Java or if you can code the other end to understand these formats. Or you can send as bytes along with the appropriate charset, or predefine the charset for the entire protocol if that makes sense.

How should I handle searching through byte arrays in Java?

Preliminary: I am writting my own httpclient in Java. I am trying to parse out the contents of chunked encoding.
Here is my dilema: Since I am trying to parse out chunked http transfer encoding with a gzip payload there is a mix of ascii and binary. I can't just take the http resp content and convert it to a string and make use of StringUtils since the binary data can easily contain nil characters. So what I need to do is some basic things for parsing out each chunk and its chunk length (as per chunked transfer/HTTP/1.1 spec).
Are there any helpful ways of searching through byte arrays of binary/part ascii data for certain patterns (like a CR LF) (instead of just a single byte) ? Or must I write the for loops for this?
Thus, you basically need a ChunkedInputStream. Google gives enough hints. Apache HttpCore's variant is pretty concise. I'd suggest you to just use it rather than reinventing your own client.
If you really, really want to implement your own HTTP client stack from the ground up, try a Google search for "+java +bytestring". One of the first hits that I got was this ByteString class which would seem to do provide the kind of functionality you are asking for.
But I think #BalusC's approach is better.

Categories