I have connected by TCP to a socket which is constantly sending a large amount of data, which I need to read in. What I have so far is a byte buffer that is reading byte by byte in a while loop. But the test case I am using right now is about 3 MB, which takes a while to read when reading in byte by byte.
Here is my code for this explanation:
ByteBuffer buff = ByteBuffer.allocate(3200000);
while(true)
{
int b = in.read();
if(b == -1 || buff.remaining() == 0)
{
break;
}
buff.put((byte)b);
}
I know that byte buffers are not thread safe and I'm not sure if this could be made faster by possibly reading in multiple bytes at a time and then storing it in the buffer? What would be a way for me to speed this process up?
Use a bulk read instead of a single byte read.
byte[] buf = new byte[3200000];
int pos = 0;
while (pos < buf.length) {
int n = in.read(buf, pos, buf.length - pos);
if (n < 0)
break;
pos += n;
}
ByteBuffer buff = ByteBuffer.wrap(buf, 0, pos);
Instead of getting an InputStream from the socket, and filling a byte array to be wrapped, you can get the SocketChannel and read() directly to the ByteBuffer.
There are several ways.
Use Channels.newChannel() to get a channel from the input stream and use ReadableByteChannel.read(buffer).
Get the byte[] array from the buffer with buffer.array() and read directly into that with in.read(array). Make sure the BB really does have an array of course. If it's a direct byte buffer it won't, but in that case you shouldn't be doing all this at all, you should be using a SocketChannel, otherwise there is zero benefit.
Read into your own largeish byte array and then use a bulk put into the ByteBuffer, taking care to use the length returned by the read() method.
Don't do it. Make up your mind as to whether you want InputStreams or ByteBuffers and don't mix your programming metaphors.
Related
While writing Java code, I really wonder why some functions require byte arrays length as an argument when the first argument was byte arrays object. Why they don't get the length from the object provided?
For example:
// E.g.: 1. Bitmap
byte[] bytes = task.getResult();
Bitmap bitmap = BitmapFactory.decodeByteArray(bytes, 0, bytes.length);
// E.g.: 2. Datagram
byte[] data = new byte[1024];
DatagramPacket request = new DatagramPacket(data, data.length);
If they want the length, why they don't use data.length?
The byte array is a buffer to which data, the length of which is less than the length of the buffer, is read. The length parameter defines the amount of bytes in the buffer that are relevant. You're not supposed to pass the length of the buffer in the parameter, that would be redundant. You're supposed to pass the number of bytes in the buffer that contain actual data.
The API documentation of DatagramPacket, for example, reveals this.
length - the number of bytes to read
The simple answer is: most read methods (in Java, and any other language) that operate on buffer arrays have to tell you the exact number of bytes that were actually read.
Keep in mind: that array is an buffer. The default behavior is that buffer.length or less bytes can be read. So, knowing how long the buffer is doesn't help you. You have to know how many bytes were actually put into the buffer.
Broadly a buffer is used as a temporary data in a data loading processing.
You fill the buffer until its size or less but never more than its capacity of course.
The DatagramPacket javadoc confirms that :
The length argument must be less than or equal to buf.length.
And a thing that you don't have to forget : conceptually you use a buffer because the data has to be progressively loaded or only a specific part of that.
In some cases you will read as much data as its maximal capacity but in some other cases you need to read only the X first bytes or the bytes from X to Y offset.
So the buffer class methods provide generally multiple way to read from the buffer.
Such as :
public DatagramPacket(byte buf[], int length);
public DatagramPacket(byte buf[], int offset, int length);
Now conceptually you are not wrong, sometimes you want to fill the whole buffer because you know that you will need to read exactly this size of data.
The java.net.DatagramSocket confirms that :
public synchronized void receive(DatagramPacket p) throws IOException {
...
tmp = new DatagramPacket(new byte[1024], 1024);
...
}
So an additional overloading such as :
public DatagramPacket(byte buf[]);
would make sense.
Because the data that you want to read can be less than or equal to byte[] buf's length.
Below is the API documentation :
public DatagramPacket(byte[] buf,
int length)
Constructs a DatagramPacket for receiving packets of length length.
The length argument must be less than or equal to buf.length.
Parameters:
buf - buffer for holding the incoming datagram.
length - the number of bytes to read.
https://docs.oracle.com/javase/7/docs/api/java/net/DatagramPacket.html
I have written the following program in java to copy file using DataInput\Output Stream.
import java.io.*;
public class FileCopier {
public static void main(String args[]) throws Exception
{
File file = new File("/home/Sample.mkv");
File copy = new File("/home/Copy_of_Sample.mkv");
if(copy.exists())
{
copy.delete();
}
copy.createNewFile();
DataInputStream din = new DataInputStream(new FileInputStream(file));
DataOutputStream dout = new DataOutputStream(new FileOutputStream(copy));
byte c;
c = 1;
long time = System.currentTimeMillis();
while(c!=-1)
{
c = (byte) din.read();
dout.write(c);
}
time = System.currentTimeMillis() - time;
System.out.println(time);
}
}
However i am not getting th expected result. As i know read() method DataInputStream reads next byte from input stream and converts it into int. Here i am converting that again into bytes and writing that byte to another file, but still this is not working as expected. The original file which i am trying to copy is of size 8.5 MB while the copied one has the size of 5.3 MB only.
Plz help me understand why this is happening.
Thanks in advance,
Nishant
The read method returns an integer and not a byte.
Reads the next byte of data from the input stream. The value byte is
returned as an int in the range 0 to 255. If no byte is available
because the end of the stream has been reached, the value -1 is
returned.
So you would get an int, check if the value is not -1, and convert that back to byte to write it to the outputstream.
int c = -1;
while((c = din.read()) !=-1)
{
dout.write((byte)c);
}
As i know read() method DataInputStream reads next byte from input stream and converts it into int.
Yes. so it should be int c; and not byte c;
Here i am converting that again into bytes and writing that byte to another file
So, this was your mistake. converting a byte to again a byte.
The original file which i am trying to copy is of size 8.5 MB while the copied one has the size of 5.3 MB only.
As pointed out, a byte with value '-1' and not the End of file marker -1, was read/encountered.
It was returned as an integer of value '255' by the read() method.
Your code stored it as '-1' byte in 'c',causing the while(c!=-1) check to fail and terminating halfway through.
The DataInputStream's read() method reads the next byte of data as a value from 0-255. The byte data type in Java is signed, so its values go from -128 to 127. That is why when casting to a byte, some value comes up as -1 which stops it.
Your best option is to use an int.
int c = 0;
while(c!=-1)
{
c = din.read();
dout.write(c);
}
Using an int means that the correct value will be read and written.
(I don't have enough rep to comment, but a char is unsigned, and two bytes in Java. Its value will never be -1.)
The problem is the (byte) cast before the result is checked for -1, which is an out-of-band sentinel value used by InputStream.read to indicate EOF.
At or around 5.3MB a byte with the value of 255 was read, and (byte)255 == -1, which caused the loop to terminate before the end of the input.
Change it as so, such that the "byte" is not cast.
int c;
while (true)
{
c = din.read();
if (c == -1) break; // don't write EOF, ever
dout.write(c); // no cast at all, as it is write(int)
}
Also, this has to be one of the most inefficient copy implementations.. using buffered streams and read(byte[]) are two easy ways to make it much faster.
I'm trying to understand the significance of the offset field of an OutputStream's write() function. It makes a lot of sense to me in the case of a file output, where seeking is a thing...but if I'm connected to a TCP stream like so:
Socket mSocket = null;
OutputStream mOutputStream = null;
byte[] msgBytes = {some message};
mSocket = new Socket();
mSocket.connect(new InetSocketAddress({arbitrary IP},{arbitrary port}));
mOutputStream = mSocket.getOutputStream();
and I send bytes like so:
// length is the length of msgBytes.
mOutputStream.write(msgBytes, 0, length);
//msgBytes2 is another message with length2
mOutputStream.write(msgBytes2, length, length2);
Does this have any different effect if I always set offset to 0 like so?
// length is the length of msgBytes.
mOutputStream.write(msgBytes, 0, length);
//msgBytes2 is another message with length2
mOutputStream.write(msgBytes2, 0, length2);
The codebase I'm working on sometimes sends 0, sometimes sends the length of the previous message. I know that if I manually send something large in that offset, I get an out of bounds error. Just trying to understand the effects of this...it doesn't make sense to me that any value of offset would be honored, since the message will immediately start streaming out the TCP port once the write call finishes, right?
The offset is the position in the byte[] pass to start copying data.
From the Javadoc for ObjectStream.write(byte[], int, int)
Writes len bytes from the specified byte array starting at offset off to this output stream. The general contract for write(b, off, len) is that some of the bytes in the array b are written to the output stream in order; element b[off] is the first byte written and b[off+len-1] is the last byte written by this operation.
It is nothing more complicated than that and it doesn't make sense to use a length from an unrelated byte[] as an offset.
//msgBytes2 is another message with length2
mOutputStream.write(msgBytes2, length, length2);
This caases length2 to start from length it is similar to
for(int i = length; i < length + length2; i++
mOutputStream.write(msgBytes2[i]);
I am sending byte arrays over a socket. The sent data starts off with 4 bytes indicating the length of the following byte array.
// get the amount of data being sent
byte[] lengthOfReplyAsArray = new byte[4];
forceRead(inputStream, lengthOfReplyAsArray);
int lengthOfReply = byteArrayToInt(lengthOfReplyAsArray);
// read the data into a byte array
byte[] reply = new byte[lengthOfReply];
forceRead(inputStream, reply);
The method used to read data from an InputStream:
private byte[] forceRead(InputStream inputStream, byte[] result)
throws IOException {
int bytesRead = 0;
int total = result.length;
int remaining = total;
while (remaining > 0)
remaining -= inputStream.read(result, bytesRead, remaining);
return result;
}
The method used to convert a byte array to an integer:
private int byteArrayToInt(byte[] byteArray) {
int result = 0;
for (int i = 0; (i<byteArray.length) && (i<8); i++) {
result |= (byteArray[3-i] & 0xff) << (i << 3);
}
return result;
}
The problem is, that the data is not read in the order it arrives. The first 4 bytes are being read just fine. The rest is mixed up. I made a TCP dump to ensure the data correctly arrives at the client. It seems as if the data is split up into 4 TCP packets. The InputStream returns the first 4 bytes of the first packet, then the entire data of the fourth packet, the last part (starting from "length of last packet") of the second packet and the entire data of the third packet. In this order.
Does anyone have a clue what might cause this issue?
Your logic for reading the byte array is not quite right:
From to the docs:
Reads up to len bytes of data from the input stream into an array of
bytes. An attempt is made to read as many as len bytes, but a smaller
number may be read. The number of bytes actually read is returned as
an integer.
and
The first byte read is stored into element b[off], the next one into
b[off+1], and so on. The number of bytes read is, at most, equal to
len. Let k be the number of bytes actually read; these bytes will be
stored in elements b[off] through b[off+k-1], leaving elements
b[off+k] through b[off+len-1] unaffected.
However, as your bytesRead variable stays at 0 for the whole loop, any data from the inputstream is always written to the beginning of your buffer, overwriting the data already in there.
What will work better (checking for -1 will also ensure that you don't subtract -1 from remaining if the stream runs out of data prematurely which would result in remaining increase, which would mean the loop would run unnecessarily until a buffer overrun would make remaining negative):
while ((bytesRead = inputStream.read(result, total - remaining, remaining)) != -1
&& remaining > 0) {
remaining -= bytesRead;
I'm writing a simple client/server network application that sends and receives fixed size messages through a TCP socket.
So far, I've been using the getInputStream() and getOutputStream() methods of the Socket class to get the streams and then call the read(byte[] b, int off, int len) method of the InputStream class to read 60 bytes each time (which is the size of a message).
Later on, I read the Javadoc for that method:
public int read(byte[] b,
int off,
int len)
throws IOException
Reads up to len bytes of data from the input stream into an array of
bytes. An attempt is made to read as many as len bytes, but a smaller
number may be read. The number of bytes actually read is returned as
an integer.
I was wondering if there's any Java "out-of-the-box" solution to block until len bytes have been read, waiting forever if necessary.
I can obviously create a simple loop but I feel like I'm reinventing the wheel. Can you suggest me a clean and Java-aware solution?
Use DataInputStream.readFully. Its Javadocs directs the reader to the DataInput Javadocs Javadocs, which state:
Reads some bytes from an input stream and stores them into the buffer array b. The number of bytes read is equal to the length of b.
InputStream in = ...
DataInputStream dis = new DataInputStream( in );
byte[] array = ...
dis.readFully( array );
The simple loop is the way to go. Given the very small number of bytes you're exchanging, I guess it will need just one iteration to read everything, but if you want to make it correct, you have to loop.
a simple for one-liner will do the trick
int toread = 60;
byte[] buff;
for(int index=0;index<toread;index+=in.read(buff,index,toread-index));
but most of the time the only reason less bytes would be read is when the stream ends or the bytes haven't all been flushed on the other side
I think the correct version of ratchet freak's answer is this :
for (int index = 0; index < toRead && (read = inputStream.read(bytes, index, toRead-index))>0 ; index+= read);
it stops reading if read returns -1