Best way to encode minimalistic messages - java

I need to create a little client/server application that should transfer data like this:
statistic <- command type identifier
15.23.63.12 <- Statistic for this IP address
increase <- the kind of action that should be done with the address's statistic
6 <- Just some other parameters...
So there has to be one string that identifies the type of the command and then there should be some parameters depending on the command type. This parameters are different but always primitive data types. Most probably String, Short, Byte, Integer, and so on...
So there are instruction sets of different primitive data types.
My question is: Is it the best way to wrap the socket's streams in DataInput/OutputStreams and just read/write from them? Or is it better to save the messages into a byte array and then wrap this byte array in a ByteArrayInputStream and wrap the ByteArrayInputStream in a DataInputStream that I can read from? Or should I wrap the byte array in a ByteBuffer?
And if I wanted to encrypt my messages, would I have to save them as a byte array, then decrypt the byte array and then wrap it into some kind of data reader?

You could do this more "efficient" if that's what you're asking.
I would model the command type modifier with bytes, provided that you don't have more than 255 distinct modes:
byte cmd_statistic = 0;
byte cmd_nonstatistic = 1;
Then each ip address could be modeled as 4 bytes, like this:
byte[] ip0 = new byte[]{15, 23, 63, 12};
byte[] ip1 = new byte[]{15, 23, 63, 13};
The action could also be bytes:
byte action_increase = 0;
byte action_decrease = 1;
And if you could model the last parameters as bytes you could get away with just using InputStreams (is) and OutputStreams like this:
// Code for reading, writing is very similar
byte cmd = (byte)is.read();
byte[] ip = new byte[4];
is.read(ip, 0, 4);
byte action = (byte)is.read();
byte extra = (byte)is.read();
This is also easy to keep in a large byte[] and that is easier to use for encryption

Related

does using String in java to hold binary data is wrong?

I need to pass binary data (red from a file) from java to c++ (using jni), so I have a C++ function that expects string (because in c++ string is just char array).
I read my binary file in java using the following code :
byte[] buffer = new byte[512];
FileInputStream in = new FileInputStream("some_file");
int rc = in.read(buffer);
while(rc != -1)
{
// rc should contain the number of bytes read in this operation.
// do stuff...
// next read
rc = in.read(buffer);
String s = new String(buffer);
// here i call my c++ function an pass "s"
}
I'm worried about the line that creates the string, what actually happens when i put the buffer inside a string ? It seems that when the data arrives to my c++ code it is different from what i expect him to be.
does the "string" constructor changes the data somehow ?
Strings are not char arrays at all. They are complex Unicode beasts with semantic interactions between the codepoints, different binary encodings, etc. This is true for all programs. The only thing that's different about C++ is that they haven't finished complaining and started doing things about it yet.
In all languages, for binary data, use an explicit binary data type, like array of bytes.
A C++ char is a Java byte. Both are 8-bit. A Java char is a 16-bit value.
Ignore that C++ calls it char. Give it a Java byte[].

Base64 binary data type in java

I need to attach a Base64 binary element to a SOAP message...Im doing a dry run to check if I can convert a value read from a file into Base64 binary..
Here is the below code..In the last line I try to print the type of encoded1(I assume it should be Base64 binary values) but I get the following display..."Attachment[B"...How can I confirm if its Base64 binary string?
Path path = Paths.get("c:/tomcatupload/text.csv");
byte[] attachment1 = Files.readAllBytes(path);
byte[] encoded1 = Base64.encode(attachment1);
System.out.println("Attachment"+ encoded1.getClass().getName());
Base-64 encoding is a way to convert arbitrary bytes to bytes that fit in a range of text characters in ASCII encoding. This is done without any interpretation whatsoever - raw bytes are converted to base-64 on sender's end; the receiver converts them back to a stream of bytes, and that's all there is to it.
When your code prints encoded1.getClass().getName(), all it gets is the static type of the byte array. In order to interpret the data encoded in base-64 as something meaningful to your program, you need to know the format of underlying data transported as base-64. Once the bytes are delivered to you (in your case, that's encoded1 array of bytes) you need to decide what's inside, and act accordingly.
For example, if a serialized Java object is sent to you as base-64, you need to take encoded1, make an in-memory stream from it, and read the object using the regular serialization mechanism:
ByteArrayInputStream memStream = new ByteArrayInputStream(encoded1);
ObjectInputStream objStream = new ObjectInputStream(memStream);
Object attachedObject = objStream.readObject();
The encoding by Base64.encode() was successful if and only if size of encoded1 > size of obtained attachment1.
Please refer, to understand how the encoding works.
http://en.wikipedia.org/wiki/Base64
By the way, your last statement doesn't print the array content. It prints the name of the class to which encoded1 belongs to.

How to convert byte array in String format to byte array?

I have created a byte array of a file.
FileInputStream fileInputStream=null;
File file = new File("/home/user/Desktop/myfile.pdf");
byte[] bFile = new byte[(int) file.length()];
try {
fileInputStream = new FileInputStream(file);
fileInputStream.read(bFile);
fileInputStream.close();
}catch(Exception e){
e.printStackTrace();
}
Now,I have one API, which is expecting a json input, there I have to put the above byte array in String format. And after reading the byte array in string format, I need to convert it back to byte array again.
So, help me to find;
1) How to convert byte array to String and then back to the same byte array?
The general problem of byte[] <-> String conversion is easily solved once you know the actual character set (encoding) that has been used to "serialize" a given text to a byte stream, or which is needed by the peer component to accept a given byte stream as text input - see the perfectly valid answers already given on this. I've seen a lot of problems due to lack of understanding character sets (and text encoding in general) in enterprise java projects even with experienced software developers, so I really suggest diving into this quite interesting topic. It is generally key to keep the character encoding information as some sort of "meta" information with your binary data if it represents text in some way. Hence the header in, for example, XML files, or even suffixes as parts of file names as it is sometimes seen with Apache htdocs contents etc., not to mention filesystem-specific ways to add any kind of metadata to files. Also, when communicating via, say, http, the Content-Type header fields often contain additional charset information to allow for correct interpretation of the actual Contents.
However, since in your example you read a PDF file, I'm not sure if you can actually expect pure text data anyway, regardless of any character encoding.
So in this case - depending on the rest of the application you're working on - you may want to transfer binary data within a JSON string. A common way to do so is to convert the binary data to Base64 and, once transferred, recover the binary data from the received Base64 string.
How do I convert a byte array to Base64 in Java?
is a good starting point for such a task.
String class provides an overloaded constructor for this.
String s = new String(byteArray, "UTF-8");
byteArray = s.getBytes("UTF-8");
Providing an explicit encoding charset is encouraged because different encoding schemes may have different byte representations. Read more here and here.
Also, your inputstream maynot read all the contents in one go. You have to read in a loop until there is nothing more left to be read. Read the documentation. read() returns the number of bytes read.
Reads up to b.length bytes of data from this input stream into an
array of bytes. This method blocks until some input is available
String.getBytes() and String(byte[] bytes) are methods to consider.
Convert byte array to String
String s = new String(bFile , "ISO-8859-1" );
Convert String to byte array
byte bArray[] =s.getBytes("ISO-8859-1");

create a specific number of bytes from a string

I would like to know if there is a way to create a specific number of bytes from a string.
I am trying to unit test some part of my code and it can take an array of bytes or a string.
But the data that I am getting will consist exactly 132 bytes (where each data point is two byte signed integers that uses 2's complement).
The data I am retrieving will consist of multiple data points in the above bytes where each data point is 2bytes.
I am planning to unit test my code. So I would like to create a string and convert to byte array and pass it so that I can cross check my data points again.
Also are there any tools available by which I can send binary data via a com port. I was looking at eltima software serial port.
there is this way which I am doing, but looking for more easy way...
final String MACID = new Character((char) 48).toString();
final String STX = new Character((char) 2).toString();
final String str = MACID + STX;
final byte[] utf8Bytes = str.getBytes("UTF-8");
this would surely just take 2 bytes.

How to get data out of network packet data in Java

In C if you have a certain type of packet, what you generally do is define some struct and cast the char * into a pointer to the struct. After this you have direct programmatic access to all data fields in the network packet. Like so :
struct rdp_header {
int version;
char serverId[20];
};
When you get a network packet you can do the following quickly :
char * packet;
// receive packet
rdp_header * pckt = (rdp_header * packet);
printf("Servername : %20.20s\n", pckt.serverId);
This technique works really great for UDP based protocols, and allows for very quick and very efficient packet parsing and sending using very little code, and trivial error handling (just check the length of the packet). Is there an equivalent, just as quick way in java to do the same ? Or are you forced to use stream based techniques ?
Read your packet into a byte array, and then extract the bits and bytes you want from that.
Here's a sample, sans exception handling:
DatagramSocket s = new DatagramSocket(port);
DatagramPacket p;
byte buffer[] = new byte[4096];
while (true) {
p = new DatagramPacket(buffer, buffer.length);
s.receive(p);
// your packet is now in buffer[];
int version = buffer[0] << 24 + buffer[1] << 16 + buffer[2] < 8 + buffer[3];
byte[] serverId = new byte[20];
System.arraycopy(buffer, 4, serverId, 0, 20);
// and process the rest
}
In practise you'll probably end up with helper functions to extract data fields in network order from the byte array, or as Tom points out in the comments, you can use a ByteArrayInputStream(), from which you can construct a DataInputStream() which has methods to read structured data from the stream:
...
while (true) {
p = new DatagramPacket(buffer, buffer.length);
s.receive(p);
ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
DataInput di = new DataInputStream(bais);
int version = di.readInt();
byte[] serverId = new byte[20];
di.readFully(serverId);
...
}
I don't believe this technique can be done in Java, short of using JNI and actually writing the protocol handler in C. The other way to do the technique you describe is variant records and unions, which Java doesn't have either.
If you had control of the protocol (it's your server and client) you could use serialized objects (inc. xml), to get the automagic (but not so runtime efficient) parsing of the data, but that's about it.
Otherwise you're stuck with parsing Streams or byte arrays (which can be treated as Streams).
Mind you the technique you describe is tremendously error prone and a source of security vulnerabilities for any protocol that is reasonably interesting, so it's not that great a loss.
I wrote something to simplify this kind of work. Like most tasks, it was much easier to write a tool than to try to do everything by hand.
It consisted of two classes, Here's an example of how it was used:
// Resulting byte array is 9 bytes long.
byte[] ba = new ByteArrayBuilder()
.writeInt(0xaaaa5555) // 4 bytes
.writeByte(0x55) // 1 byte
.writeShort(0x5A5A) // 2 bytes
.write( (new BitBuilder()) // 2 bytes---0xBA12
.write(3, 5) // 101 (3 bits value of 5)
.write(2, 3) // 11 (2 bits value of 3)
.write(3, 2) // 010 (...)
.write(2, 0) // 00
.write(2, 1) // 01
.write(4, 2) // 0002
).getBytes();
I wrote the ByteArrayBuilder to simply accumulate bits. I used a method chaining pattern (Just returning "this" from all methods) to make it easier to write a bunch of statements together.
All the methods in the ByteArrayBuilder were trivial, just like 1 or 2 lines of code (I just wrote everything to a data output stream)
This is to build a packet, but tearing one apart shouldn't be any harder.
The only interesting method in BitBuilder is this one:
public BitBuilder write(int bitCount, int value) {
int bitMask=0xffffffff;
bitMask <<= bitCount; // If bitcount is 4, bitmask is now ffffff00
bitMask = ~bitMask; // and now it's 000000ff, a great mask
bitRegister <<= bitCount; // make room
bitRegister |= (value & bitMask); // or in the value (masked for safety)
bitsWritten += bitCount;
return this;
}
Again, the logic could be inverted very easily to read a packet instead of build one.
edit: I had proposed a different approach in this answer, I'm going to post it as a separate answer because it's completely different.
Look at the Javolution library and its struct classes, they will do just what you are asking for. In fact, the author has this exact example, using the Javolution Struct classes to manipulate UDP packets.
This is an alternate proposal for an answer I left above. I suggest you consider implementing it because it would act pretty much the same as a C solution where you could pick fields out of a packet by name.
You might start it out with an external text file something like this:
OneByte, 1
OneBit, .1
TenBits, .10
AlsoTenBits, 1.2
SignedInt, +4
It could specify the entire structure of a packet, including fields that may repeat. The language could be as simple or complicated as you need--
You'd create an object like this:
new PacketReader packetReader("PacketStructure.txt", byte[] packet);
Your constructor would iterate over the PacketStructure.txt file and store each string as the key of a hashtable, and the exact location of it's data (both bit offset and size) as the data.
Once you created an object, passing in the bitStructure and a packet, you could randomly access the data with statements as straight-forward as:
int x=packetReader.getInt("AlsoTenBits");
Also note, this stuff would be much less efficient than a C struct, but not as much as you might think--it's still probably many times more efficient than you'll need. If done right, the specification file would only be parsed once, so you would only take the minor hit of a single hash lookup and a few binary operations for each value you read from the packet--not bad at all.
The exception is if you are parsing packets from a high-speed continuous stream, and even then I doubt a fast network could flood even a slowish CPU.
Short answer, no you can't do it that easily.
Longer answer, if you can use Serializable objects, you can hook your InputStream up to an ObjectInputStream and use that to deserialize your objects. However, this requires you have some control over the protocol. It also works easier if you use a TCP Socket. If you use a UDP DatagramSocket, you will need to get the data from the packet and then feed that into a ByteArrayInputStream.
If you don't have control over the protocol, you may be able to still use the above deserialization method, but you're probably going to have to implement the readObject() and writeObject() methods rather than using the default implementation given to you. If you need to use someone else's protocol (say because you need to interop with a native program), this is likely the easiest solution you are going to find.
Also, remember that Java uses UTF-16 internally for strings, but I'm not certain that it serializes them that way. Either way, you need to be very careful when passing strings back and forth to non-Java programs.

Categories