IIB - convert BLOB to String using Java Compute Node - java

So I have a simple message flow with a File Read node, parsing a .txt (saying whatever) to BLOB, which I have to convert to a string in a Java Compute Node. Never used JAVA, how do I go about this?
Then I have to give the string a new value (whatever) and switch the logical tree body element to the new value.
Should be simple, but still a steep learning curve for me, out of nowhere. All helps are appreciated. :)

When parsing to BLOB, you end up with a byte array in assembly.getMessage().getRootElement().getLastChild().getLastChild(), and converting that to String should be easy:
String(byte[] bytes, Charset charset)
You can get the charset from the Preperties subtree.
You can read about accessing the message tree parts here:
https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.0/com.ibm.etools.mft.doc/ac30330_.htm

Just in case, one other way to do it would be to parse the input file with a dedicated parser directly (DFDL, ...). If one day your document is not in the format you expected, it will throw a proper error and won't crash on the java line trying to cast as a string something that is not a string. It might be too complex for your case (and also useless), but if you are learning, I would recommand you to play with the parsers so you won't have to learn about it for futur cases.
But reading as a BLOB is totally fine as long as you keep in minds that it means litteraly anything could be read, so the java solution is totally fine as long as you handle it properly (try/catch/throw).

Related

performance and size limitations on HttpServletResponse.getOutputStream.print(string) vs getWriter(String)

For a web project I'm writing large sections of text to a webpage(table) or even bigger (could be several MB) to CSV files for download.
The java method dealing with this receives a StringBuilder content string, which originally (by the creator of this module) was being sent char by char in a loop:
response.getOutputStream().write(content.charAt(i)).
Upon questioning about the loop, the reason given was that he thought the string might be too big for writing in one go. (using java 1.6).
I can't find any size restrictions anywhere, and then also the question came which method to use instead: print() or getWriter()?
The data in the string is all text.
He assumed wrong. If anything it's inefficient, or at least useless to do that one character at a time. If you have a String in memory, you can write it out at one go without worrying.
If you're only writing text, use a Writer. OutputStream is for binary data (although you can wrap it in an OutputStreamWriter to convert between the two). See Writer or OutputStream?

Parse byte array as HTTP object

In Java, how would I convert a byte array (TCP packet payload from a pcap file) into some kind of HTTP object that I can use to get HTTP headers and content body?
One of the stupid lovely things about Java is a total lack of unsigned types. So, a good place to start would be taking your byte array and converting it into a short array to make sure that you don't have any rollover problems. (16 bits versus 8 bits per number).
From there, you could use a BufferedOutputStream to write your data to a file and parse it with one of the Java built-in XML readers, such as JaxB or DOM. BufferedOutputStream writes hex directly to a file, and can take an input of an int, byte, or short array. After you write it out, using the OutputStream it should be very simple to parse the HTML out of it.
If you need any help with any of these individual steps, I'd be happy to help.
EDIT: as maerics has pointed out, perhaps I didn't grasp what you were asking. Regardless, writing your byte array with a BufferedOutputStream is the way to go in my opinion, and I could still help you build a parser if you want.
JNetPcap can do exactly this.
Here are examples for
Opening a pcap file
Parsing http (in the example, we extract an image)
Drawback: parsing http in this library is depracated*, but that doesn't mean it doesn't work
*I can't post anymore links without more reputation. Sorry. You can Google for "jnetpcap http deprecated"

Which kind of representation can be '\r\x00\x00\x00' (if usually I have hexadecimal code:'\x0\x00\x00\x03')

I'm using a program (klee) that give me tests of c code.
I need to use the results in my program.
It is not readable information, but some of the solutions are hexadecimal data with the next format:
'\x0e\x00\x00\x00'
I have already asked about how to convert it into integer, and I found the solution.
I will have to introduce this kind of results in structs too, I will know the size but any about the fields or anything else about it.
I think I can solve this but now the problem is that sometimes you can obtain things like:
'\n\x00\x00\x00'= 13
or
'\r\x00\x00\x00' = 10
And I didn't found which kind of representation they use to convert it in readable information..
Apparently I could solve this in python with:
import struct
selection = struct.unpack('
I don't have any idea of pyton, and I would like found a solution in java or c.
Thanks very much
The value \n\r is used by Windows systems to indicate a newline - the \n moves to the new line, and \r moves the write pointer to the start of the line. I'm thinking that you might have had some character data containing a newline where each character was converted into a 32-bit integer value in big-endian format.
Hope this helps!

How do I identify that I am at the last byte of a serialized Java object?

Question
What is (if there is any) terminating characters/byte sequences in serialized java objects?
Background
I'm working on a small self-education project where I would like to serialize java objects and write them to a stream where there are read and then unserialized. Since, I will need to identify the borders between serialized objects and I can't be sure that the current object is not the last one, is there a terminating character that is always there that I can use as my identifier?
I noticed that there is a magic number ACED that allows me to identify the start of the object, so how do I identify the end?
EDIT:
If there is no terminating character, is there any safe terminating characters/sequences that I can use (insert) to identify the end of the object?
In theory you should always be able to find the end of an object, in practice you cannot. I understand the problem is customised writeObject implementations that don't call either defaultReadObject or readFields have a non-standard representation.
I've played about with serialisation in the past. Including creating streams for use when I've been doing unusual things to the ObjectInputStream. It's not pleasant(!).
You can read the details in the spec, and the source is worth a read.
there are none. AFAIK the only requirement is that the deserialiser know when to stop reading, when given a corresponding serialisation. subject to that, the serialiser can write whatever it wants -- in any position not just the last.
if you're old skool dump a 32-bit length field at the beginning a refuse to handle objects bigger than 4 gig.
nu scool, you just make sure your read and your write logic are consistent and don't care about the length.
You can add a terminating object to your object stream. e.g. null or a special String.
However, I suggest that you instead convert the ObjectsStream to a byte[] and write the byte length of the byte[] followed by its data. This way each ObjectStream is independent and you always know where it finishes.
Have you considered applying a record-marking layer similar to HTTP Chunked encoding?
The Chunked encoding is intended to solve a generalization of this scenario: identifying the end of a message of indeterminate length that both itself contains no identifiable end, and is embedded in a longer stream without ending it.

Developing a (file) exchange format for java

I want to come up with a binary format for passing data between application instances in a form of POFs (Plain Old Files ;)).
Prerequisites:
should be cross-platform
information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
only sequential access is required
should be a way to check data consistency
should be small and fast
should prevent an average user with archiver + notepad from modifying the data
Currently I'm using DeflaterOutputStream + OutputStreamWriter together with InflaterInputStream + InputStreamReader to save/restore objects serialized with XStream, one object per file. Readers/Writers use UTF8.
Now, need to extend this to support the previously described.
My idea of format:
{serialized to XML object}
{delimiter}
{String file name}{delimiter}{byte[] file data}
{delimiter}
{another String file name}{delimiter}{another byte[] file data}
...
{delimiter}
{delimiter}
{MD5 hash for the entire file}
Does this look sane?
What would you use for a delimiter and how would you determine it?
The right way to calculate MD5 in this case?
What would you suggest to read on the subject?
TIA.
It looks INsane.
why invent a new file format?
why try to prevent only stupid users from changing file?
why use a binary format ( hard to compress ) ?
why use a format that cannot be parsed while being received? (receiver has to receive entire file before being able to act on the file. )
XML is already a serialization format that is compressable. So you are serializing a serialized format.
Would serialization of the model (if you are into MVC) not be another way? I'd prefer to use things in the language (or standard libraries) rather then roll my own if possible. The only issue I can see with that is that the file size may be larger than you want.
1) Does this look sane?
It looks fairly sane. However, if you are going to invent your own format rather than just using Java serialization then you should have a good reason. Do you have any good reasons (they do exist in some cases)? One of the standard reasons for using XStream is to make the result human readable, which a binary format immediately loses. Do you have a good reason for a binary format rather than a human readable one? See this question for why human readable is good (and bad).
Wouldn't it be easier just to put everything in a signed jar. There are already standard Java libraries and tools to do this, and you get compression and verification provided.
2) What would you use for a delimiter and how determine it?
Rather than a delimiter I'd explicitly store the length of each block before the block. It's just as easy, and prevents you having to escape the delimiter if it comes up on its own.
3) The right way to calculate MD5 in this case?
There is example code here which looks sensible.
4) What would you suggest to read on the subject?
On the subject of serialization? I'd read about the Java serialization, JSON, and XStream serialization so I understood the pros and cons of each, especially the benefits of human readable files. I'd also look at a classic file format, for example from Microsoft, to understand possible design decisions from back in the days that every byte mattered, and how these have been extended. For example: The WAV file format.
Let's see this should be pretty straightforward.
Prerequisites:
0. should be cross-platform
1. information to be persisted includes a single POJO & arbitrary byte[]s (files actually, the POJO stores it's names in a String[])
2. only sequential access is required
3. should be a way to check data consistency
4. should be small and fast
5. should prevent an average user with archiver + notepad from modifying the data
Well guess what, you pretty much have it already, it's built-in the platform already:Object Serialization
If you need to reduce the amount of data sent in the wire and provide a custom serialization ( for instance you can sent only 1,2,3 for a given object without using the attribute name or nothing similar, and read them in the same sequence, ) you can use this somehow "Hidden feature"
If you really need it in "text plain" you can also encode it, it takes almost the same amount of bytes.
For instance this bean:
import java.io.*;
public class SimpleBean implements Serializable {
private String website = "http://stackoverflow.com";
public String toString() {
return website;
}
}
Could be represented like this:
rO0ABXNyAApTaW1wbGVCZWFuPB4W2ZRCqRICAAFMAAd3ZWJzaXRldAASTGphdmEvbGFuZy9TdHJpbmc7eHB0ABhodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20=
See this answer
Additionally, if you need a sounded protocol you can also check to Protobuf, Google's internal exchange format.
You could use a zip (rar / 7z / tar.gz / ...) library. Many exists, most are well tested and it'll likely save you some time.
Possibly not as much fun though.
I agree in that it doesn't really sound like you need a new format, or a binary one.
If you truly want a binary format, why not consider one of these first:
Binary XML (fast infoset, Bnux)
Hessian
google packet buffers
But besides that, many textual formats should work just fine (or perhaps better) too; easier to debug, extensive tool support, compresses to about same size as binary (binary compresses poorly, and information theory suggests that for same effective information, same compression rate is achieved -- and this has been true in my testing).
So perhaps also consider:
Json works well; binary support via base64 (with, say, http://jackson.codehaus.org/)
XML not too bad either; efficient streaming parsers, some with base64 support (http://woodstox.codehaus.org/, "typed access API" under 'org.codehaus.stax2.typed.TypedXMLStreamReader').
So it kind of sounds like you just want to build something of your own. Nothing wrong with that, as a hobby, but if so you need to consider it as such.
It likely is not a requirement for the system you are building.
Perhaps you could explain how this is better than using an existing file format such as JAR.
Most standard files formats of this type just use CRC as its faster to calculate. MD5 is more appropriate if you want to prevent deliberate modification.
Bencode could be the way to go.
Here's an excellent implementation by Daniel Spiewak.
Unfortunately, bencode spec doesn't support utf8 which is a showstopper for me.
Might come to this later but currently xml seems like a better choice (with blobs serialized as a Map).

Categories