Decoding bytes from InputStream

Decoding bytes from InputStream - java

If I had a byte stream which is encoded with the following format:
0x20 Length D_1 D_2 ... D_n CS
0x20...marks the beginning of a data frame
Length...number of bytes following
D_1-D_n...data bytes (interpreted as signed ints)
CS...One Byte Checksum
Example:
0x20 0x05 0x01 0x02 0xD1 0xA1 0xAF
0x20 0x05 0x12 0x03 0x02 0x3A 0xF2
...
...
How would I decode this byte stream from an InputStream in a way which is common and efficient? My idea is to use a BufferedInputStream and the following procedure (in pseudo code):
BufferedInputStream bufIS = new BufferedInputStream(mInStream);
while (true ) {
byte startFrame = bufIS.read();
if(startFrame == 0x20) { //first byte must mark the beginning of a data frame
int length = bufIS.read();
byte dataframe[] = new bye[length];
for (int i = 0; i<length i++) { //read data bytes
dateframe[i] = bufIS.read();
}
if(verifyChecksum(bufIS.read()) postEvent(dataframe);
}
}
Is this an appropriate way for receiving the data frames from the InputStream? I cant't think of a better solution for this. Also there's an alternative read(byte[] b, int off, int len) method which reads chunks of bytes from the stream but it's not guaranteed that len number of bytes are read, so i think this is not helpful in my case.

Id use an ObjectInputStream with custom deserialization.
So you'd write a class that has a List of ints and implements readObject such that when given an inputstream it reads the constant (and discards it), then the length, then n ints, then the checksum, and optionally validates it.
This keeps all the code in one place, and allows you to just read fully formed objects from your data stream.

I'd did work on binary serialization at some stage and we did use BufferedInputStream with the read() method. Even more since we were using a network stream end the 'len' is almost always unrespected.
What we did instead was writing our own helper method the took a length and the buffer as parameters then return those byte for decoding use that we needed. We used that only when we are certain that we will get the full length of byte in our stream (well as long as it's a valid stream that we read)

Related

In which form add hash of document at the end of document

I get byte array using:
byte[] digitalSignature = signature.sign();
So what is the best way to save this at the end of txt file or anyone file type, so that I can read it against when I verify sign. My idea is to make String: "Digital signature:" and add on this byte array in String form I tried this:
String stringAddOnEndOfDocument = new String("Digital signature:" + new String(digitalSignature));
When I read file, I find "Digital signature:", and read String after that and convert to byte array using getBytes() method, and then delete this from file.. But I can not verify signature of document with this.. I suppose that there is problem with conversion from bytes to string, but I do not what exactly..
Here is the code how I verify signature:
deleteHashDataFromEndOfFile(testFile);
byte[] messageBytes = Files.readAllBytes(Paths.get(testFile.toString()));
signature.update(messageBytes);
signature.verify(byteArray)

The best way is to use a separate file, but if you really have to store the signature at the end of the file there are a few ways to do this.
Keep in mind that new String(digitalSignature) likely skips unprintable bytes and therefore destroys your signature. You need to handle it always as byte[] or encode it to a printable format using Hex or Base64 encoding.
Using "Digital Signature:" as a marker that the signature follows might work, but breaks if the actual text file contains exactly this text. To fix that you can either search the whole file for that text and only take the last occurrence, or you can to that in a binary fashion and store the signature always at the end. Since signatures usually always have the same length it works by slicing or copying only the last known bytes from the file. If there is a chance that the signature might have a variable length, you can designate the last 4 bytes for example as the length of the signature and be sure that the signature is exactly before that.
You can use int lengthOfSignature = new BigInteger(theLast4Bytes).intValue() to read the value and write it with BigInteger.valueOf(lengthOfSignature).toByteArray() (make sure that it is 4 bytes long and if necessary pad with 0x00 bytes at the front). When reading the signature length your code should test whether the number makes sense: is positive and in a range that you expect 255-257 bytes for example. After that it is only a little bit of index math to get the signature.
Writing might look like this:
byte[] messageBytes = Files.readAllBytes(file);
byte[] signature = sign(privateKey, messageBytes);
byte[] signatureLength = BigInteger.valueOf(lengthOfSignature).toByteArray();
byte[] messageOutput = new byte[messageBytes.length + signature.length + 4];
System.arraycopy(messageBytes, 0, messageOutput, 0, messageBytes.length);
System.arraycopy(signature, 0, messageOutput, messageBytes.length, signature.length);
System.arraycopy(signatureLength, 0, messageOutput, messageBytes.length + signature.length + 4 - signatureLength.length, signatureLength.length); // padding included
// TODO write messageOutput to file
Reading and verifying the signature would look like this:
byte[] msgAndSigBytes = Files.readAllBytes(file);
byte[] signatureLengthBytes = Arrays.copyOfRange(msgAndSigBytes, msgAndSigBytes.length-4, msgAndSigBytes);
int signatureLength = new BigInteger(signatureLengthBytes).intValue();
// TODO check for proper size according to your signature algorithm
byte[] signature = Arrays.copyOfRange(msgAndSigBytes, msgAndSigBytes.length-4-signatureLength, msgAndSigBytes-4);
byte[] msgBytes = Arrays.copyOf(msgAndSigBytes, msgAndSigBytes.length-4-signatureLength);
boolean success = verify(publicKey, signature);
If the file is large then streams should be used. When writing, then normal input or output streams can be used, but reading would either require seekable streams or multiple passes.

How FileInputStream and FileOutputStream Works in Java?

I'm reading about all input/output streams in java on Java Tutorials Docs. Tutorials writer use this example:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("xanadu.txt");
out = new FileOutputStream("outagain.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
xanadu.txt File data:
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
Output to outagain.txt file:
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
Why do the writers use int c even if we are reading characters?
Why use -1 in while condition?
How out.write(c); method convert int to again characters?

1: Now I want to ask why writer use int c? even we are reading characters.
FileInputStream.read() returns one byte of data as an int. This works because a byte can be represented as an int without loss of precision. See this answer to understand why int is returned instead of byte.
2: The second why use -1 in while condition?
When the end of file is reached, -1 is returned.
3: How out.write(c); method convert int to again characters? that provide same output in outagain.txt file
FileOutputStream.write() takes a byte parameter as an int. Since an int spans over more values than a byte, the 24 high-order bits of the given int are ignored, making it a byte-compatible value: an int in Java is always 32 bits. By removing the 24 high-order bits, you're down to a 8 bits value, i.e. a byte.
I suggest you read carefully the Javadocs for each of those method. As reference, they answer all of your questions:
read:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
write:
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.

Just read the docs.
here is the read method docs
http://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html#read()
public int read()
throws IOException
Reads a byte of data from this input stream. This method blocks if no input is yet available.
Specified by:
read in class InputStream
Returns:
the next byte of data, or -1 if the end of the file is reached.
That int is a your next set of bytes data.
Now , here are the answers.
1) When you assign a char to an int, it denotes it's ascii number to the int.
If you are interested, here us the list of chars and their ascii codes https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html
2)-1 if the end of the file is reached. So that's a check to data exists or not.
3)When you send an ascii code to print writer, it's prints that corresponding char to the file.

convert int or long to byte hex array

what is the easiest way to convert an integer or a long value to a byte buffer?
example:
input : 325647187
output : {0x13,0x68,0xfb,0x53}
I have tried a ByteBuffer like this :
ByteBuffer buffer = ByteBuffer.allocate(4);
buffer.putLong(325647187);
byte[] x=buffer.array();
for(int i=0;i<x.length;i++)
{
System.out.println(x[i]);
}
but I get exception
Exception in thread "main" java.nio.BufferOverflowException
at java.nio.Buffer.nextPutIndex(Buffer.java:527)
at java.nio.HeapByteBuffer.putLong(HeapByteBuffer.java:423)
at MainApp.main(MainApp.java:11)

You allocated a 4 bytes buffer, but when calling putLong, you attempted to put 8 bytes in it. Hence the overflow. Calling ByteBuffer.allocate(8) will prevent the exception.
Alternately, if the encoded number is an integer (as in your snippet), it's enough to allocate 4 bytes and call putInt().

you can try this for an easier way of converting:
so you have 325647187 as your input, we can then have something like this
byte[] bytes = ByteBuffer.allocate(4).putInt(325647187).array();
for (byte b : bytes)
{
System.out.format("0x%x ", b);
}
for me this is(if not the most) an efficient way of converting to byte buffer.

Get bytes from the Int returned from socket intputStream read()

I have an InputStream and I want to read each char until I find a comma "," from a socket.
Heres my code
private static Packet readPacket(InputStream is) throws Exception
{
int ch;
Packet p = new Packet();
String type = "";
while((ch = is.read()) != 44) //44 is the "," in ISO-8859-1 codification
{
if(ch == -1)
throw new IOException("EOF");
type += new String(ch, "ISO-8859-1"); //<----DOES NOT COMPILE
}
...
}
String constructor does not receive an int, only an array of bytes. I read the documentation and the it says
read():
Reads the next byte of data from the input stream.
How can I convert this int to byte then ? Is it using only the less significant bits (8 bits) of all 32 bits of the int ?
Since Im working with Java, I want to keep it full plataform compatible (little endian vs big endian, etc...) Whats the best approach here and why ?
PS: I dont want to use any ready-to-use classes like DataInputStream, etc....

The String constructor takes a char[] (an array)
type += new String(new byte[] { (byte) ch }, "ISO-8859-1");
Btw. it would be more elegant to use a StringBuilder for type and make use of its append-methods. Its faster and also shows the intend better:
private static Packet readPacket(InputStream is) throws Exception {
int ch;
Packet p = new Packet();
StringBuilder type = new StringBuilder();
while((ch = is.read()) != 44) {
if(ch == -1)
throw new IOException("EOF");
// NOTE: conversion from byte to char here is iffy, this works for ISO8859-1/US-ASCII
// but fails horribly for UTF etc.
type.append((char) ch);
}
String data = type.toString();
...
}
Also, to make it more flexible (e.g. work with other character encodings), your method would better take an InputStreamReader that handles the conversion from bytes to characters for you (take look at InputStreamReader(InputStream, Charset) constructor's javadoc).

For this can use an InputStreamReader, which can read encoded character data from a raw byte stream:
InputStreamReader reader = new InputStreamReader(is, "ISO-8859-1");
You may now use reader.read(), which will consume the correct number of bytes from is, decode as ISO-8859-1, and return a Unicode code point that can be correctly cast to a char.
Edit: Responding to comment about not using any "ready-to-use" classes:
I don't know if InputStreamReader counts. If it does, check out Durandal's answer, which is sufficient for certain single byte encodings (like US-ASCII, arguable, or ISO-8859-1).
For multibyte encodings, if you do not want to use any other classes, you would first buffer all data into a byte[] array, then construct a String from that.
Edit: Responding to a related question in the comments on Abhishek's answer.
Q:
Abhishek wrote: Can you please enlighten me a little more? i have tried casting integer ASCII to character..it has worked..can you kindly tell where did i go wrong?
A:
You didn't go "wrong", per se. The reason ASCII works is the same reason that Brian pointed out that ISO-8859-1 works. US-ASCII is a single byte encoding, and bytes 0x00-0x7f have the same value as their corresponding Unicode code points. So a cast to char is conceptually incorrect, but in practice, since the values are the same, it works. Same with ISO-8859-1; bytes 0x00-0xff have the same value as their corresponding code points in that encoding. A cast to char would not work in e.g. IBM01141 (a single byte encoding but with different values).
And, of course, a single byte to char cast would not work for multibyte encodings like UTF-16, as more than one input byte must be read (a variable number, in fact) to determine the correct value of a corresponding char.

type += new String(String.valueOf(ch).getBytes("ISO-8859-1"));

Partial answer: Try replacing :
type += new String(ch, "ISO-8859-1");
by
type+=(char)ch;
This can be done if you receive the ASCII value of the char.Code converts ASCII in to char by casting.
Its better to avoid lengthy code and this would work just fine. The read() function works in many ways:
One way is: int= inpstr.read();
Second inpstr.read(byte)
So its up to you which method you wanna use.. both have different purpose..

Decode bytes to chars one at a time

I have an arbitrary chunk of bytes that represent chars, encoded in an arbitrary scheme (may be ASCII, UTF-8, UTF-16). I know the encoding.
What I'm trying to do is find the location of the last new line (\n) in the array of bytes. I want to know how many bytes are left over after reading the last encoded \n.
I can't find anything in the JDK or any other library that will let me convert a byte array to chars one by one. InputStreamReader reads the stream in chunks, not giving me any indication how many bytes are getting read to produce a char.
Am I going to have to do something as horrible are re-encoding each char to figure out its byte length?

You can try something like this
CharsetDecoder cd = Charset.forName("UTF-8").newDecoder();
ByteBuffer in = ByteBuffer.wrap(bytes);
CharBuffer out = CharBuffer.allocate(1);
int p = 0;
while (in.hasRemaining()) {
cd.decode(in, out, true);
char c = out.array()[0];
int nBytes = in.position() - p;
p = in.position();
out.position(0);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.