Why CodedInputStream.readRawVarint64() is reading all the bytes from underlying stream? - java

Here is a sample code demonstrating the problem.
ByteArrayOutputStream bos = new ByteArrayOutputStream();
CodedOutputStream cos = CodedOutputStream.newInstance(bos);
cos.writeRawVarint64(25);
cos.flush();
bos.write("something else".getBytes());
System.out.println("size(bos) = " + bos.size()); // This gives 15
ByteArrayInputStream bis = new ByteArrayInputStream(bos.toByteArray());
CodedInputStream cis = CodedInputStream.newInstance(bis);
System.out.println("size(bis) = " + bis.available()); // This gives 15
long l = cis.readRawVarint64();
System.out.println(cis.getTotalBytesRead()); // This gives 1, which is correct
System.out.println("Raw varint64 = " + l); // This gives 25, which is correct
System.out.println("size(bis) = " + bis.available()); // This now gives 0!!
All I am trying to do is to encode a 64 bit integer and add some more data to the payload. I can read the encoded data correctly. But for some reason, it clears the underlying stream after that. Any one know why this is happening? How can I read the varint from stream and read the remaining bytes as indicated by the varint?
Any help would be great

I have no idea what codedinputstream does but it could very well buffer the input meaning it reads e.g. 100 bytes a time.
Either way you should not wrap an inputstream B around an inputstream A and continue reading from A specifically because you don't know what B does.
For instance maybe B must look ahead in the data to form some conclusion or it uses buffering or...
Additional note: available() is usually a bad idea though it should work correctly specifically on a bytearrayinputstream.
EDIT:
In conclusion: just continue reading from the codedinputstream, don't try to read from the underlying one.

Related

Sending large data over TCP/IP socket

I have a small project running a server in C# and a client in Java. The server sends images to the client.
Some images are quite big (up to 10MiB sometimes), so I split the image bytes and send it in chunks of 32768 bytes each.
My C# Server code is as follows:
using (var stream = new MemoryStream(ImageData))
{
for (int j = 1; j <= dataSplitParameters.NumberOfChunks; j++)
{
byte[] chunk;
if (j == dataSplitParameters.NumberOfChunks)
chunk = new byte[dataSplitParameters.FinalChunkSize];
else
chunk = new byte[dataSplitParameters.ChunkSize];
int result = stream.Read(chunk, 0, chunk.Length);
string line = DateTime.Now + ", Status OK, " + ImageName+ ", ImageChunk, " + j + ", " + dataSplitParameters.NumberOfChunks + ", " + chunk.Length;
//write read params
streamWriter.WriteLine(line);
streamWriter.Flush();
//write the data
binaryWriter.Write(chunk);
binaryWriter.Flush();
Console.WriteLine(line);
string deliveryReport = streamReader.ReadLine();
Console.WriteLine(deliveryReport);
}
}
And my Java Client code is as follows:
long dataRead = 0;
for (int j = 1; j <= numberOfChunks; j++) {
String line = bufferedReader.readLine();
tokens = line.split(", ");
System.out.println(line);
int toRead = Integer.parseInt(tokens[tokens.length - 1]);
byte[] chunk = new byte[toRead];
int read = inputStream.read(chunk, 0, toRead);
//do something with the data
dataRead += read;
String progressReport = pageLabel + ", progress: " + dataRead + "/" + dataLength + " bytes.";
bufferedOutputStream.write((progressReport + "\n").getBytes());
bufferedOutputStream.flush();
System.out.println(progressReport);
}
The problem is when I run the code, either the client crashes with an error saying it is reading bogus data, or both the client and the server hang. This is the error:
Document Page 1, progress: 49153/226604 bytes.
�9��%>�YI!��F�����h�
Exception in thread "main" java.lang.NumberFormatException: For input string: .....
What am I doing wrong?
The basic problem.
Once you wrap an inputstream into a bufferedreader you must stop accessing the inputstream. That bufferedreader is buffered, it will read as much data as it wants to, it is NOT limited to reading exactly up to the next newline symbol(s) and stopping there.
The BufferedReader on the java side has read a lot more than that, so it's consumed a whole bunch of image data already, and there's no way out from here. By making that BufferedReader, you've made the job impossible, so you can't do that.
The underlying problem.
You have a single TCP/IP connection. On this, you send some irrelevant text (the page, the progress, etc), and then you send an unknown amount of image data, and then you send another irrelevant progress update.
That's fundamentally broken. How can an image parser possibly know that halfway through sending an image, you get a status update line? Text is just binary data too, there is no magic identifier that lets a client know: This byte is part of the image data, but this byte is some text sent in-between with progress info.
The simple fix.
You'd think the simple fix is.. well, stop doing that then! Why are you sending this progress? The client is perfectly capable of knowing how many bytes it read, there is no point sending that. Just.. take your binary data. open the outputstream. send all that data. And on the client side, open the inputstream, read all that data. Don't involve strings. Don't use anything that smacks of 'works with characters' (so, BufferedReader? No. BufferedInputStream is fine).
... but now the client doesn't know the title, nor the total size!
So make a wire protocol. It can be near trivial.
This is your wire protocol:
4 bytes, big endian: SizeOfName
SizeOfName number of bytes. UTF-8 encoded document title.
4 bytes, big endian: SizeOfData
SizeOfData number of bytes. The image data.
And that's if you actually want the client to be able to render a progress bar and to know the title. If that's not needed, don't do any of that, just straight up send the bytes, and signal that the file has been completely sent by.. closing the connection.
Here's some sample java code:
try (InputStream in = ....) {
int nameSize = readInt(in);
byte[] nameBytes = in.readNBytes(nameSize);
String name = new String(nameBytes, StandardCharsets.UTF_8);
int dataSize = readInt(in);
try (OutputStream out =
Files.newOutputStream(Paths.get("/Users/TriSky/image.png")) {
byte[] buffer = new byte[65536];
while (dataSize > 0) {
int r = in.read(buffer);
if (r == -1) throw new IOException("Early end-of-stream");
out.write(buffer, 0, r);
dataSize -= r;
}
}
}
public int readInt(InputStream in) throws IOException {
byte[] b = in.readNBytes(4);
return ByteBuffer.wrap(b).getInt();
}
Closing notes
Another bug in your app is that you're using the wrong method. Java's 'read(bytes)' method will NOT (neccessarily) fully fill that byte array. All read(byte[]) will do is read at least 1 byte (unless the stream is closed, then it reads none, and returns -1. The idea is: read will read the optimal number of bytes: Exactly as many as are ready to give you right now. How many is that? Who knows - if you ignore the returned value of in.read(bytes), your code is neccessarily broken, and you're doing just that. What you really want is for example readNBytes which guarantees that it fully fills that byte array (or until stream ends, whichever happens first).
Note that in the transfer code above, I also use the basic read, but here I don't ignore the return value.
Your Java code seems to be using a BufferedReader. It reads data into a buffer of its own, meaning it is no longer available in the underlying socket input stream - that's your first problem. You have a second problem with how inputStream.read is used - it's not guaranteed to read all the bytes you ask for, you would have to put a loop around it.
This is not a particularly easy problem to solve. When you mix binary and text data in the same stream, it is difficult to read it back. In Java, there is a class called DataInputStream that can help a little - it has a readLine method to read a line of text, and also methods to read binary data:
DataInputStream dataInput = new DataInputStream(inputStream);
for (int j = 1; j <= numberOfChunks; j++) {
String line = dataInput.readLine();
...
byte[] chunk = new byte[toRead];
int read = dataInput.readFully(chunk);
...
}
DataInputStream has limitations: the readLine method is deprecated because it assumes the text is encoded in latin-1, and does not let you use a different text encoding. If you want to go further down this road you'll want to create a class of your own to read your stream format.
Some images are quite big (up to 10MiB sometimes), so I split the image bytes and send it in chunks of 32768 bytes each.
You know this is totally unnecessary right? There is absolutely no problem sending multiple megabytes of data into a TCP socket, and streaming all of the data in on the receiving side.
When you try to send image, you have to open the image as a normal file then substring the image into some chunks and every chunk change it into "base64encode" when you send and the client decode it because the image data is not normal data, so base64encode change this symbols to normal chars like AfHM65Hkgf7MM

Base64 Encoded to Decoded File Conversion Problem

I am processing very large files (> 2Gig). Each input file is Base64 encoded, andI am outputting to new files after decoding. Depending on the buffer size (LARGE_BUF) and for a given input file, my input to output conversion either works fine, is missing one or more bytes, or throws an exception at the outputStream.write line (IllegalArgumentException: Last unit does not have enough bits). Here is the code snippet (could not cut and paste so my not be perfect):
.
.
final int LARGE_BUF = 1024;
byte[] inBuf = new byte[LARGE_BUF];
try(InputStream inputStream = new FileInputStream(inFile); OutputStream outStream new new FileOutputStream(outFile)) {
for(int len; (len = inputStream.read(inBuf)) > 0); ) {
String out = new String(inBuf, 0, len);
outStream.write(Base64.getMimeDecoder().decode(out.getBytes()));
}
}
For instance, for my sample input file, if LARGE_BUF is 1024, output file is 4 bytes too small, if 2*1024, I get the exception mentioned above, if 7*1024, it works correctly. Grateful for any ideas. Thank you.
First, you are converting bytes into a String, then immediately back into bytes. So, remove the use of String entirely.
Second, base64 encoding turns each sequence of three bytes into four bytes, so when decoding, you need four bytes to properly decode three bytes of original data. It is not safe to create a new decoder for each arbitrarily read sequence of bytes, which may or may not have a length which is an exact multiple of four.
Finally, Base64.Decoder has a wrap(InputStream) method which makes this considerably easier:
try (InputStream inputStream = Base64.getDecoder().wrap(
new BufferedInputStream(
Files.newInputStream(Paths.get(inFile))))) {
Files.copy(inputStream, Paths.get(outFile));
}

How to properly read an InputStream with multiple contents

public static void main(String[] args) throws Exception
{
// Server sends 3 numbers to the client
ByteArrayOutputStream bos = new ByteArrayOutputStream();
bos.write(1000);
bos.write(2000);
bos.write(3000);
// Client receive the bytes
final byte[] bytes = bos.toByteArray();
ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
System.out.println(bis.read());
System.out.println(bis.read());
System.out.println(bis.read());
}
The code above is breaking because bis.read() returns an int in the range 0 to 255
How can I receive those numbers properly? Should I use a delimiter and keep reading the stream until I find it? If so, what if I'm sending multiple files, I think if the delimiter as a single byte it could matched somewhere in the file and also break.
Use decorators for your streams!
All you have to do is to wrap your Output- and InputStream by java.io.ObjectOutputStream / and java.io.ObjectInputStream. These classes support writing and reading ints (a 4-byte value) with a single method call to writeInt/readInt.
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream os = new ObjectOutputStream(bos);
os.writeInt(1000);
os.writeInt(2000);
os.writeInt(3000);
os.close();
// Client receive the bytes
final byte[] bytes = bos.toByteArray();
ObjectInputStream is = new ObjectInputStream(new ByteArrayInputStream(bytes));
System.out.println(is.readInt());
System.out.println(is.readInt());
System.out.println(is.readInt());
Don't forget to close the streams. Use try/finally or try-with-resources.
Byte stream is a stream of bytes. So if you're reading stream and want to differentiate between different parts of the stream then you should "create" some sort of protocol.
Here are some ideas that can be relevant:
Use delimiter as you've stated by yourself, If you're concerned about the length - do not one byte length, but something more unique - something that you're sure you won't see in the parts themselves.
At the beginning of the part allocate N bytes (2-4 or maybe more, depending on data) and write the size of the part that will follow.
So that when you create the stream (writer), before actually streaming the "part" - calculate its size and encode it. This is a protocol between reader and writer.
When you read - read the size (=N bytes for example), and then read N bytes. Now you know that the part is ended, and the next part (again, size + content) will follow
Can you try ByteBuffer class?
ByteStream is just a stream of bytes. It doesn't understand integer which actually needs more than one byte. If you print bytes.length it will return you 3. Surely you need more bytes than that. Allocate 4 bytes before you write an integer and then write to it. Check out this class above. Hope that helps!

FileInputStream and DataOutputStream - handling byte[] buffer

I've been working on an app to move files between two hosts and while I got the transfer process to work (code is still really messy so sorry for that, I'm still fixing it) I'm kinda left wondering how exactly it handles the buffer. I'm fairly new to networking in java so I just don't want to end up with "meh i got it to work so let's move on" attitude.
File sending code.
public void sendFile(String filepath, DataOutputStream dos) throws Exception{
if (new File(filepath).isFile()&&dos!=null){
long size = new File(filepath).length();
String strsize = Long.toString(size) +"\n";
//System.out.println("File size in bytes: " + strsize);
outToClient.writeBytes(strsize);
FileInputStream fis = new FileInputStream(filepath);
byte[] filebuffer = new byte[8192];
while(fis.read(filebuffer) > 0){
dos.write(filebuffer);
dos.flush();
}
File recieving code
public void saveFile() throws Exception{
String size = inFromServer.readLine();
long longsize = Long.parseLong(size);
//System.out.println(longsize);
String tmppath = currentpath + "\\" + tmpdownloadname;
DataInputStream dis = new DataInputStream(clientSocket.getInputStream());
FileOutputStream fos = new FileOutputStream(tmppath);
byte[] filebuffer = new byte[8192];
int read = 0;
int remaining = (int)longsize;
while((read = dis.read(filebuffer, 0, Math.min(filebuffer.length, remaining))) > 0){
//System.out.println(Math.min(filebuffer.length, remaining));
//System.out.println(read);
//System.out.println(remaining);
remaining -= read;
fos.write(filebuffer,0, read);
}
}
I'd like to know how exactly buffers on both sides are handled to avoid writing wrong bytes. (ik how receiving code avoids that but i'd still like to know how byte array is handled)
Does fis/dis always wait for buffers to fill up fully? In receiving code it always writes full array or remaining length if it's less than filebuffer.length but what about fis from sending code.
In fact, your code could have a subtle bug, exactly because of the way you handle buffers.
When you read a buffer from the original file, the read(byte[]) method returns the number of bytes actually read. There is no guarantee that, in fact, all 8192 bytes have been read.
Suppose you have a file with 10000 bytes. Your first read operation reads 8192 bytes. Your second read operation, however, will only read 1808 bytes. The third operation will return -1.
In the first read, you write exactly the bytes that you have read, because you read a full buffer. But in the second read, your buffer actually contains 1808 correct bytes, and the remaining 6384 bytes are wrong - they are still there from the previous read.
In this case you are lucky, because this only happens in the last buffer that you write. Thus, the fact that you stop reading on your client side when you reach the pre-sent length causes you to skip those 6384 wrong bytes which you shouldn't have sent anyway.
But in fact, there is no actual guarantee that reading from the file will return 8192 bytes even if the end was not reached yet. The method's contract does not guarantee that, and it's up to the OS and underlying file system. It could, for example, send you 5000 bytes in your first read, and 5000 in your second read. In this case, you would be sending 3192 wrong bytes in the middle of the file.
Therefore, your code should actually look like:
byte[] filebuffer = new byte[8192];
int read = 0;
while(( read = fis.read(filebuffer)) > 0){
dos.write(filebuffer,0,read);
dos.flush();
}
much like the code you have on the receiving side. This guarantees that only the actual bytes read will be written.
So there is nothing actually magical about the way buffers are handled. You give the stream a buffer, you tell it how much of the buffer it's allowed to fill, but there is no guarantee it will fill all of it. It may fill less and you have to take care and use only the portion it tells you it fills.
Another grave mistake you are making, though, is to just convert the long that you received into an int in this line:
int remaining = (int)longsize;
Files may be longer than an integer contains. Especially things like long videos etc. This is why you get that number as a long in the first place. Don't truncate it like that. Keep the remaining as long and change it to int only after you have taken the minimum (because you know the minimum will always be in the range of an int).
long remaining = longsize;
long fileBufferLen = filebuffer.length;
while((read = dis.read(filebuffer, 0, (int)Math.min(fileBufferLen, remaining))) > 0){
...
}
By the way, there is no real reason to use a DataOutputStream and DataInputStream for this. The read(byte[]), read(byte[],int,int), write(byte[]), and write(byte[],int,int) are inherited from the underlying InputStream and there is no reason not to use the socket's OutputStream/InputStream directly, or use a BufferedOutputStream/BufferedOutputStream to wrap it. There is also no need to use flush until you have finished writing/reading.
Also, do not forget to close at least your file input/output streams when you are done with them. You may want to keep the socket input/output streams open for continued communication, but there is no need to keep the files themselves open, it may cause problems. Use a try-with-resources to guarantee that they are closed.

ByteArrayOutputStream: Odd behavior

I'm writing a simple client-server application and I wanted to be able to take the attributes of a Header class, turn them into a byte[], send them to the other host, and then convert them back into an easily parsed Header. I was using a ByteArrayOutputStream to do this, but the results were not what I expected. For example, just to test it in main() I had:
Header h = Header();
h.setSource(111);
h.setDest(222);
h.setSeq(333);
h.setAck(444);
byte[] header = Header.convertHeaderToByteArray();
Header newHeader = new Header(headerArray);
Where convertHeaderToByteArray() looked like:
public byte[] convertHeaderToByteArray() {
byte[] headerArray;
ByteArrayOutputStream byteStream = new ByteArrayOutputStream(44);
byteStream.write(srcPort);
byteStream.write(dstPort);
byteStream.write(seqNum);
byteStream.write(ackNum);
byteStream.write(controlBits);
headerArray = byteStream.toByteArray();
return headerArray;
}
And the Header(headerArray) constructor:
public Header(byte[] headerArray) {
ByteArrayInputStream header = new ByteArrayInputStream(headerArray);
srcPort = header.read();
dstPort = header.read();
seqNum = header.read();
ackNum = header.read();
}
This definitely did not behave as expected. When I looked at those values at the end, srcPort was correct (111), dstPort was correct (222), seqNum was not correct (77), and ackNum was not correct (188).
After hours of reading and tinkering I couldn't get it right, so I tried to use ByteBuffer instead. Viola, correct results.
What is going on here? I read the documentation for both and although I spotted some differences I'm not seeing what the source of my error is.
OutputStream.write(int) writes a single byte. See the Javadoc. If you want to write wider values, you will have to use the writeXXX() methods of DataOutputStream, and the corresponding readXXX() methods of DataInputStream to read them.

Categories