Copying a file byte-by-byte using DataInputStream stops before completion - java

I have written the following program in java to copy file using DataInput\Output Stream.
import java.io.*;
public class FileCopier {
public static void main(String args[]) throws Exception
{
File file = new File("/home/Sample.mkv");
File copy = new File("/home/Copy_of_Sample.mkv");
if(copy.exists())
{
copy.delete();
}
copy.createNewFile();
DataInputStream din = new DataInputStream(new FileInputStream(file));
DataOutputStream dout = new DataOutputStream(new FileOutputStream(copy));
byte c;
c = 1;
long time = System.currentTimeMillis();
while(c!=-1)
{
c = (byte) din.read();
dout.write(c);
}
time = System.currentTimeMillis() - time;
System.out.println(time);
}
}
However i am not getting th expected result. As i know read() method DataInputStream reads next byte from input stream and converts it into int. Here i am converting that again into bytes and writing that byte to another file, but still this is not working as expected. The original file which i am trying to copy is of size 8.5 MB while the copied one has the size of 5.3 MB only.
Plz help me understand why this is happening.
Thanks in advance,
Nishant

The read method returns an integer and not a byte.
Reads the next byte of data from the input stream. The value byte is
returned as an int in the range 0 to 255. If no byte is available
because the end of the stream has been reached, the value -1 is
returned.
So you would get an int, check if the value is not -1, and convert that back to byte to write it to the outputstream.
int c = -1;
while((c = din.read()) !=-1)
{
dout.write((byte)c);
}
As i know read() method DataInputStream reads next byte from input stream and converts it into int.
Yes. so it should be int c; and not byte c;
Here i am converting that again into bytes and writing that byte to another file
So, this was your mistake. converting a byte to again a byte.
The original file which i am trying to copy is of size 8.5 MB while the copied one has the size of 5.3 MB only.
As pointed out, a byte with value '-1' and not the End of file marker -1, was read/encountered.
It was returned as an integer of value '255' by the read() method.
Your code stored it as '-1' byte in 'c',causing the while(c!=-1) check to fail and terminating halfway through.

The DataInputStream's read() method reads the next byte of data as a value from 0-255. The byte data type in Java is signed, so its values go from -128 to 127. That is why when casting to a byte, some value comes up as -1 which stops it.
Your best option is to use an int.
int c = 0;
while(c!=-1)
{
c = din.read();
dout.write(c);
}
Using an int means that the correct value will be read and written.
(I don't have enough rep to comment, but a char is unsigned, and two bytes in Java. Its value will never be -1.)

The problem is the (byte) cast before the result is checked for -1, which is an out-of-band sentinel value used by InputStream.read to indicate EOF.
At or around 5.3MB a byte with the value of 255 was read, and (byte)255 == -1, which caused the loop to terminate before the end of the input.
Change it as so, such that the "byte" is not cast.
int c;
while (true)
{
c = din.read();
if (c == -1) break; // don't write EOF, ever
dout.write(c); // no cast at all, as it is write(int)
}
Also, this has to be one of the most inefficient copy implementations.. using buffered streams and read(byte[]) are two easy ways to make it much faster.

Related

Unsigned int to byte

I'm trying to create a custom inputstream. My problem is, the read() method returns an integer from 0-255, but I need to convert it to a byte, decrypt it, and convert it back to an integer. How?
I need something like:
InputStream in = ...;
OutputStream out = ...;
int unsigned = in.read();
byte signed = unsignedIntToSignedByte(unsigned); // from -128 to 127
... // Editing it here
outputstream.write(signedByteToUnsignedInt(signed)); // from 0 - 255
Noting that creating your own encryption is unsafe, and assuming you're doing it "just for fun" and don't think in any way that what you're doing is secure, you really don't need anything special...
int i = in.read();
byte b = (byte) i;
byte e = encrypt(b);
out.write(e);
would be the basic approach, assuming byte encrypt(byte b) method which does the "encryption". Checking for end-of-stream, exception handling, performance considerations (you don't want to perform things 1 byte at a time) etc. have been left out from this example.

Faster way to read from Socket with ByteBuffer?

I have connected by TCP to a socket which is constantly sending a large amount of data, which I need to read in. What I have so far is a byte buffer that is reading byte by byte in a while loop. But the test case I am using right now is about 3 MB, which takes a while to read when reading in byte by byte.
Here is my code for this explanation:
ByteBuffer buff = ByteBuffer.allocate(3200000);
while(true)
{
int b = in.read();
if(b == -1 || buff.remaining() == 0)
{
break;
}
buff.put((byte)b);
}
I know that byte buffers are not thread safe and I'm not sure if this could be made faster by possibly reading in multiple bytes at a time and then storing it in the buffer? What would be a way for me to speed this process up?
Use a bulk read instead of a single byte read.
byte[] buf = new byte[3200000];
int pos = 0;
while (pos < buf.length) {
int n = in.read(buf, pos, buf.length - pos);
if (n < 0)
break;
pos += n;
}
ByteBuffer buff = ByteBuffer.wrap(buf, 0, pos);
Instead of getting an InputStream from the socket, and filling a byte array to be wrapped, you can get the SocketChannel and read() directly to the ByteBuffer.
There are several ways.
Use Channels.newChannel() to get a channel from the input stream and use ReadableByteChannel.read(buffer).
Get the byte[] array from the buffer with buffer.array() and read directly into that with in.read(array). Make sure the BB really does have an array of course. If it's a direct byte buffer it won't, but in that case you shouldn't be doing all this at all, you should be using a SocketChannel, otherwise there is zero benefit.
Read into your own largeish byte array and then use a bulk put into the ByteBuffer, taking care to use the length returned by the read() method.
Don't do it. Make up your mind as to whether you want InputStreams or ByteBuffers and don't mix your programming metaphors.

JDK7 Files.copy

In the OpenJDK7 project java.nio.file.Files, there is the following function. My question is, should the while loop condition be >= instead of >? This is because the source.read javadoc says that when EOF is reached, it'll return -1 and not 0.
/**
* Reads all bytes from an input stream and writes them to an output stream.
*/
private static long copy(InputStream source, OutputStream sink)
throws IOException
{
long nread = 0L;
byte[] buf = new byte[BUFFER_SIZE];
int n;
while ((n = source.read(buf)) > 0) {
sink.write(buf, 0, n);
nread += n;
}
return nread;
}
Whether this is a bug or not depends on the intent of the function.
Normally this will work exactly as you expect since the call to read will block until at least one byte of data becomes available. However, if the input stream is non-blocking, the read call will return 0 when there is currently no more data available. This state is different from the stream being actively closed.
In other words, one could argue that this is a bug or not, depending on what you expect it to do when faced with a non-blocking stream that has no data available at the moment the method is called.
Same, because InputStream.read(byte[]) here won't return 0. From javadoc
at least one byte is read
You are looking at the wrong read function.
The read function of InputStream that takes a byte array will return the number of bytes that have been copied into the buffer. So you can know how many bytes you can then copy out of it.
* #return the total number of bytes read into the buffer, or
* <code>-1</code> if there is no more data because the end of
* the stream has been reached.
So it does cover both cases: end of stream reached (-1) or no bytes read into the buffer for any other reason.

Reading from socket input stream returns data in the wrong order

I am sending byte arrays over a socket. The sent data starts off with 4 bytes indicating the length of the following byte array.
// get the amount of data being sent
byte[] lengthOfReplyAsArray = new byte[4];
forceRead(inputStream, lengthOfReplyAsArray);
int lengthOfReply = byteArrayToInt(lengthOfReplyAsArray);
// read the data into a byte array
byte[] reply = new byte[lengthOfReply];
forceRead(inputStream, reply);
The method used to read data from an InputStream:
private byte[] forceRead(InputStream inputStream, byte[] result)
throws IOException {
int bytesRead = 0;
int total = result.length;
int remaining = total;
while (remaining > 0)
remaining -= inputStream.read(result, bytesRead, remaining);
return result;
}
The method used to convert a byte array to an integer:
private int byteArrayToInt(byte[] byteArray) {
int result = 0;
for (int i = 0; (i<byteArray.length) && (i<8); i++) {
result |= (byteArray[3-i] & 0xff) << (i << 3);
}
return result;
}
The problem is, that the data is not read in the order it arrives. The first 4 bytes are being read just fine. The rest is mixed up. I made a TCP dump to ensure the data correctly arrives at the client. It seems as if the data is split up into 4 TCP packets. The InputStream returns the first 4 bytes of the first packet, then the entire data of the fourth packet, the last part (starting from "length of last packet") of the second packet and the entire data of the third packet. In this order.
Does anyone have a clue what might cause this issue?
Your logic for reading the byte array is not quite right:
From to the docs:
Reads up to len bytes of data from the input stream into an array of
bytes. An attempt is made to read as many as len bytes, but a smaller
number may be read. The number of bytes actually read is returned as
an integer.
and
The first byte read is stored into element b[off], the next one into
b[off+1], and so on. The number of bytes read is, at most, equal to
len. Let k be the number of bytes actually read; these bytes will be
stored in elements b[off] through b[off+k-1], leaving elements
b[off+k] through b[off+len-1] unaffected.
However, as your bytesRead variable stays at 0 for the whole loop, any data from the inputstream is always written to the beginning of your buffer, overwriting the data already in there.
What will work better (checking for -1 will also ensure that you don't subtract -1 from remaining if the stream runs out of data prematurely which would result in remaining increase, which would mean the loop would run unnecessarily until a buffer overrun would make remaining negative):
while ((bytesRead = inputStream.read(result, total - remaining, remaining)) != -1
&& remaining > 0) {
remaining -= bytesRead;

Java InputStream.read(byte[], int, int) method, how to block until the exact number of bytes has been read

I'm writing a simple client/server network application that sends and receives fixed size messages through a TCP socket.
So far, I've been using the getInputStream() and getOutputStream() methods of the Socket class to get the streams and then call the read(byte[] b, int off, int len) method of the InputStream class to read 60 bytes each time (which is the size of a message).
Later on, I read the Javadoc for that method:
public int read(byte[] b,
int off,
int len)
throws IOException
Reads up to len bytes of data from the input stream into an array of
bytes. An attempt is made to read as many as len bytes, but a smaller
number may be read. The number of bytes actually read is returned as
an integer.
I was wondering if there's any Java "out-of-the-box" solution to block until len bytes have been read, waiting forever if necessary.
I can obviously create a simple loop but I feel like I'm reinventing the wheel. Can you suggest me a clean and Java-aware solution?
Use DataInputStream.readFully. Its Javadocs directs the reader to the DataInput Javadocs Javadocs, which state:
Reads some bytes from an input stream and stores them into the buffer array b. The number of bytes read is equal to the length of b.
InputStream in = ...
DataInputStream dis = new DataInputStream( in );
byte[] array = ...
dis.readFully( array );
The simple loop is the way to go. Given the very small number of bytes you're exchanging, I guess it will need just one iteration to read everything, but if you want to make it correct, you have to loop.
a simple for one-liner will do the trick
int toread = 60;
byte[] buff;
for(int index=0;index<toread;index+=in.read(buff,index,toread-index));
but most of the time the only reason less bytes would be read is when the stream ends or the bytes haven't all been flushed on the other side
I think the correct version of ratchet freak's answer is this :
for (int index = 0; index < toRead && (read = inputStream.read(bytes, index, toRead-index))>0 ; index+= read);
it stops reading if read returns -1

Categories