Java: BufferedReader reads more than a line?

Java: BufferedReader reads more than a line? - java

I'm making a program in Java with Sockets. I can send commands to the client and from the client to the server. To read the commands I use a BufferedReader. To write them, a PrintWriter But now I want to transfer a file through that socket (Not simply create a second connection).First I write to the outputstream how many bytes the file contains. For example 40000 bytes. So I write the number 40000 through the socket, but the other side of the connection reads 78.
So I was thinking: The BufferedReader reads more than just the line (by calling readLine()) and on that way I lose some bytes from the file-data. Because they are in the buffer from the BufferedReader.
So the number 78 is a byte of the file I want to transmit.
Is this way of thinking right, or not. If so, how to sovle this problem.
I hope I've explained well.
Here is my code, but my default language is Dutch. So some variable-name can sound stange.
public void flushStreamToStream(InputStream is, OutputStream os, boolean closeIn, boolean closeOut) throws IOException {
byte[] buffer = new byte[BUFFERSIZE];
int bytesRead;
if ((!closeOut) && closeIn) { // To Socket from File
action = "Upload";
os.write(is.available()); // Here I write 400000
max = is.available();
System.out.println("Bytes to send: " + max);
while ((bytesRead = is.read(buffer)) != -1) {
startTiming(); // Two lines to compute the speed
os.write(buffer, 0, bytesRead);
stopTiming(); // Speed compution
process += bytesRead;
}
os.flush();
is.close();
return;
}
if ((!closeIn) && closeOut) { // To File from Socket
action = "Download";
int bytesToRead = -1;
bytesToRead = is.read(); // Here he reads 78.
System.out.println("Bytes to read: " + bytesToRead);
max = bytesToRead;
int nextBufferSize;
while ((nextBufferSize = Math.min(BUFFERSIZE, bytesToRead)) > 0) {
startTiming();
bytesRead = is.read(buffer, 0, nextBufferSize);
bytesToRead -= bytesRead;
process += nextBufferSize;
os.write(buffer, 0, bytesRead);
stopTiming();
}
os.flush();
os.close();
return;
}
throw new IllegalArgumentException("The only two boolean combinations are: closeOut == false && closeIn == true AND closeOut == true && closeIn == false");
}
Here is the solution:
Thanks to James suggestion
I think laginimaineb anwser was a piece of the solution.
Read the commands.
DataInputStream in = new DataInputStream(is); // Originally a BufferedReader
// Read the request line
String str;
while ((str = in.readLine()) != null) {
if (str.trim().equals("")) {
continue;
}
handleSocketInput(str);
}
Now the flushStreamToStream:
public void flushStreamToStream(InputStream is, OutputStream os, boolean closeIn, boolean closeOut) throws IOException {
byte[] buffer = new byte[BUFFERSIZE];
int bytesRead;
if ((!closeOut) && closeIn) { // To Socket from File
action = "Upload";
DataOutputStream dos = new DataOutputStream(os);
dos.writeInt(is.available());
max = is.available();
System.out.println("Bytes to send: " + max);
while ((bytesRead = is.read(buffer)) != -1) {
startTiming();
dos.write(buffer, 0, bytesRead);
stopTiming();
process += bytesRead;
}
os.flush();
is.close();
return;
}
if ((!closeIn) && closeOut) { // To File from Socket
action = "Download";
DataInputStream dis = new DataInputStream(is);
int bytesToRead = dis.readInt();
System.out.println("Bytes to read: " + bytesToRead);
max = bytesToRead;
int nextBufferSize;
while ((nextBufferSize = Math.min(BUFFERSIZE, bytesToRead)) > 0) {
startTiming();
bytesRead = is.read(buffer, 0, nextBufferSize);
bytesToRead -= bytesRead;
process += nextBufferSize;
os.write(buffer, 0, bytesRead);
stopTiming();
}
os.flush();
os.close();
return;
}
throw new IllegalArgumentException("The only two boolean combinations are: closeOut == false && closeIn == true AND closeOut == true && closeIn == false");
}
Martijn.

I'm not sure I've followed your explanation.
However, yes - you have no real control over how much a BufferedReader will actually read. The point of such a reader is that it optimistically reads chunks of the underlying resource as needed to replenish its buffer. So when you first call readLine, it will see that its internal buffer doesn't have enough to serve youthe request, and will go off and read however many bytes it feels like into its buffer from the underlying source, which will generally be much more than you asked for just then. Once the buffer's been populated, it returns your line from the buffered content.
Thus once you wrap an input stream in a BufferedReader, you should be sure to only read that stream through the same buffered reader. If you don't you'll end up losing data (as some bytes will have been consumed and are now sitting in the BufferedReader's cache waiting to be served).

DataInputStream is most likely what you want to use. Also, don't use the available() method as it is generally useless.

A BufferedReader assumes that it is the only one reading from the underlying input stream.
It's purpose is to minimize the number of reads from the underlying stream (which are expensive, as they can delegate quite deeply). To that end, it keeps a buffer, which it fills by reading as many bytes as possible into it in a single call to the underlying stream.
So yes, your diagnosis is accurate.

Just a wild stab here - 40000 is 1001110001000000 in binary. Now, the first seven bits here are 1001110 which is 78. Meaning, you're writing 2 bytes of information but reading seven bits.

Related

Java: How to count read bytes from InputStream without allocating the full memory before

I have a Java-backend where user can upload files to it. I want to limit these uploaded files to a max size and want to check the amount of uploaded bytes while the upload happens and break the transmission as soon as the limit is reached.
Currently I am using InputStream.available() before allocation for determination of estimated size, but that seems to be seen as unreliable.
Any suggestions?

You can use Guava's CountingInputstream or Apache IO's CountingInputStream when you want to know how many bytes have been read.
On the other hand when you want to stop the upload immediatly when reaching some limit then just count while reading chunks of bytes and close the stream when the limit has been exceeded.

You don't have to 'allocat[e] the full memory before'. Just use a normally sized buffer, say 8k, and perform the normal copy loop, tallying the total transferred. If it exceeds the quota, stop, and destroy the output file.

int count = 1;
InputStream stream;
if (stream.available() < 3) {
count++;
}
Result:
[0][1]{2][3]
1 1 1 1

If you're using a servlet and a multipart request you can do this:
public void doPost( final HttpServletRequest request, final HttpServletResponse response )
throws ServletException, IOException {
String contentLength = request.getHeader("Content-Length");
if (contentLength != null && maxRequestSize > 0 &&
Integer.parseInt(contentLength) > maxRequestSize) {
throw new MyFileUploadException("Multipart request is larger than allowed size");
}
}

My solution looks like this:
public static final byte[] readBytes (InputStream in, int maxBytes)
throws IOException {
byte[] result = new byte[maxBytes];
int bytesRead = in.read (result);
if (bytesRead > maxBytes) {
throw new IOException ("Reached max bytes (" + maxBytes + ")");
}
if (bytesRead < 0) {
result = new byte[0];
}
else {
byte[] tmp = new byte[bytesRead];
System.arraycopy (result, 0, tmp, 0, bytesRead);
result = tmp;
}
return result;
}
EDIT:
New variant
public static final byte[] readBytes (InputStream in, int bufferSize, int maxBytes)
throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[bufferSize];
int bytesRead = in.read (buffer);
out.write (buffer, 0, bytesRead);
while (bytesRead >= 0) {
if (maxBytes > 0 && out.size() > maxBytes) {
String message = "Reached max bytes (" + maxBytes + ")";
log.trace (message);
throw new IOException (message);
}
bytesRead = in.read (buffer);
if (bytesRead < 0)
break;
out.write (buffer, 0, bytesRead);
}
return out.toByteArray();
}

All method implementations of read return the number of bytes read. So you can initiate a counter and increment it appropriately with each read to see how many bytes you've reads so far. Method available() allows you to see how many bytes are available for reading at the buffer at the moment and it has no relation to the total size of the file. this method could be very useful though to optimize your reading so each time you can request to read the chunk that is readily available and avoid blocking. Also in your case you can predict before reading if the amount of bytes that you will have after the upcoming reading will exceed your limit and thus you can cancel it even before you read the next chunk

I can't get all bytes from website

I'm trying to read all bytes from a web site but I think I don't get all bytes. I give a high value for bytes array length. I used this method but it always returns an exception.
Here is the code:
DataInputStream dis = new DataInputStream(s2.getInputStream());
byte[] bytes = new byte[900000];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=dis.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("Could not completely read website");
}
out.write(bytes);
Edited Version:
ByteArrayOutputStream bais = new ByteArrayOutputStream();
InputStream is = null;
try {
is = s2.getInputStream();
byte[] byteChunk = new byte[4096]; // Or whatever size you want to read in at a time.
int n;
while ( (n = is.read(byteChunk)) > 0 ) {
bais.write(byteChunk, 0, n);
}
}
catch (IOException e) {
System.err.printf ("Failed while reading bytes");
e.printStackTrace ();
// Perform any other exception handling that's appropriate.
}
finally {
if (is != null) { is.close(); }
}
byte[] asd = bais.toByteArray();
out.write(asd);

This is the problem:
if (offset < bytes.length)
You'll only trigger that if the original data is more than 900,000 bytes. If the response is entirely complete in less than that, read() will return -1 correctly to indicate the end of the stream.
You should actually be throwing an exception if offset is equal to bytes.length, as that indicates that you might have truncated data :)
It's not clear where you got the 900,000 value from, mind you...
I would suggest that if you want to stick with the raw stream, you use Guava's ByteStreams.toByteArray method to read all the data. Alternatively, you could keep looping round, reading into a smaller buffer, writing into a ByteArrayOutputStream on each iteration.

I realise this doesn't answer your specific question. However I really wouldn't hand-code this sort of thing, when libraries such as HttpClient exist and are debugged/profiled etc.
e.g. here's how to use the fluent interface
Request.Get("http://targethost/homepage").execute().returnContent();
JSoup is an alternative if you're dealing with grabbing and scraping HTML.

Fastest way to incrementally read a large file

When given a buffer of MAX_BUFFER_SIZE, and a file that far exceeds it, how can one:
Read the file in blocks of MAX_BUFFER_SIZE?
Do it as fast as possible
I tried using NIO
RandomAccessFile aFile = new RandomAccessFile(fileName, "r");
FileChannel inChannel = aFile.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(CAPARICY);
int bytesRead = inChannel.read(buffer);
buffer.flip();
while (buffer.hasRemaining()) {
buffer.get();
}
buffer.clear();
bytesRead = inChannel.read(buffer);
aFile.close();
And regular IO
InputStream in = new FileInputStream(fileName);
long length = fileName.length();
if (length > Integer.MAX_VALUE) {
throw new IOException("File is too large!");
}
byte[] bytes = new byte[(int) length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead = in.read(bytes, offset, bytes.length - offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
throw new IOException("Could not completely read file " + fileName);
}
in.close();
Turns out that regular IO is about 100 times faster in doing the same thing as NIO. Am i missing something? Is this expected? Is there a faster way to read the file in buffer chunks?
Ultimately i am working with a large file i don't have memory for to read it all at once. Instead, I'd like to read it incrementally in blocks that would then be used for processing.

If you want to make your first example faster
FileChannel inChannel = new FileInputStream(fileName).getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect(CAPACITY);
while(inChannel.read(buffer) > 0)
buffer.clear(); // do something with the data and clear/compact it.
inChannel.close();
If you want it to be even faster.
FileChannel inChannel = new RandomAccessFile(fileName, "r").getChannel();
MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
// access the buffer as you wish.
inChannel.close();
This can take 10 - 20 micro-seconds for files up to 2 GB in size.

Assuming that you need to read the entire file into memory at once (as you're currently doing), neither reading smaller chunks nor NIO are going to help you here.
In fact, you'd probably be best reading larger chunks - which your regular IO code is automatically doing for you.
Your NIO code is currently slower, because you're only reading one byte at a time (using buffer.get();).
If you want to process in chunks - for example, transferring between streams - here is a standard way of doing it without NIO:
InputStream is = ...;
OutputStream os = ...;
byte buffer[] = new byte[1024];
int read;
while((read = is.read(buffer)) != -1){
os.write(buffer, 0, read);
}
This uses a buffer size of only 1 KB, but can transfer an unlimited amount of data.
(If you extend your answer with details of what you're actually looking to do at a functional level, I could further improve this to a better answer.)

Java: Issue with available() method of BufferedInputStream

I'm dealing with the following code that is used to split a large file into a set of smaller files:
FileInputStream input = new FileInputStream(this.fileToSplit);
BufferedInputStream iBuff = new BufferedInputStream(input);
int i = 0;
FileOutputStream output = new FileOutputStream(fileArr[i]);
BufferedOutputStream oBuff = new BufferedOutputStream(output);
int buffSize = 8192;
byte[] buffer = new byte[buffSize];
while (true) {
if (iBuff.available() < buffSize) {
byte[] newBuff = new byte[iBuff.available()];
iBuff.read(newBuff);
oBuff.write(newBuff);
oBuff.flush();
oBuff.close();
break;
}
int r = iBuff.read(buffer);
if (fileArr[i].length() >= this.partSize) {
oBuff.flush();
oBuff.close();
++i;
output = new FileOutputStream(fileArr[i]);
oBuff = new BufferedOutputStream(output);
}
oBuff.write(buffer);
}
} catch (Exception e) {
e.printStackTrace();
}
This is the weird behavior I'm seeing... when I run this code using a 3GB file, the initial iBuff.available() call returns a value of a approximatley 2,100,000,000 and the code works fine. When I run this code on a 12GB file, the initial iBuff.available() call only returns a value of 200,000,000 (which is smaller than the split file size of 500,000,000 and causes the processing to go awry).
I'm thinking this discrepancy in behvaior has something to do with the fact that this is on 32-bit windows. I'm going to run a couple more tests on a 4.5 GB file and a 3.5 GB file. If the 3.5 file works and the 4.5 one doesn't, that will further confirm the theory that it's a 32bit vs 64bit issue since 4GB would then be the threshold.

Well if you read the javadoc it quite clearly states:
Returns the number of bytes that can
be read from this input stream
without blocking (emphasis added by me)
So it's quite clear that what you want is not what this method offers. So depending on the underlying InputStream you may get problems much earlier (eg a stream over the network with a server that doesn't return the filesize - you'd have to read the complete file and buffer it just to return the "correct" available() count, which would take a lot of time - what if you only want to read a header?)
So the correct way to handle this is to change your parsing method to be able to handle the file in pieces. Personally I don't see much reason at all to even use available() here - just calling read() and stopping as soon as read() returns -1 should work fine. Can be made more complicated if you want to assure that every file really contains blockSize byte - just add an internal loop if that scenario is important.
int blockSize = XXX;
byte[] buffer = new byte[blockSize];
int i = 0;
int read = in.read(buffer);
while(read != -1) {
out[i++].write(buffer, 0, read);
read = in.read(buffer);
}

There are few correct uses of available(), and this isn't one of them. You don't need all that junk. Memorize this:
int count;
byte[] buffer = new byte[8192]; // or more
while ((count = in.read(buffer)) > 0)
out.write(buffer, 0, count);
That's the canonical way to copy a stream in Java.

You should not use the InputStream.available() function at all. It is only needed in very special circumstances.
You should also not create byte arrays that are larger than 1 MB. It's a waste of memory. The commonly accepted way is to read a small block (4 kB up to 1 MB) from the source file and then store only as many bytes as you have read in the destination file. Do that until you have reached the end of the source file.

available isn't a measure of how much is still to be read but more a measure how much is guaranteed to be able to read before it might EOF or block waiting for input
and put close calls in the finallies
BufferedInputStream iBuff = new BufferedInputStream(input);
int i = 0;
FileOutputStream output;
BufferedOutputStream oBuff=0;
try{
int buffSize = 8192;
int offset=0;
byte[] buffer = new byte[buffSize];
while(true){
int len = iBuff.read(buffer,offset,buffSize-offset);
if(len==-1){//EOF write out last chunk
oBuff.write(buffer,0,offset);
break;
}
if(len+offset==buffSize){//end of buffer write out to file
try{
output = new FileOutputStream(fileArr[i]);
oBuff = new BufferedOutputStream(output);
oBuff.write(buffer);
}finally{
oBuff.close();
}
++i;
offset=0;
}
offset+=len;
}//while
}finally{
iBuff.close();
}

Here is some code that splits a file. If performance is critical to you, you can experiment with the buffer size.
package so6164853;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Formatter;
public class FileSplitter {
private static String printf(String fmt, Object... args) {
Formatter formatter = new Formatter();
formatter.format(fmt, args);
return formatter.out().toString();
}
/**
* #param outputPattern see {#link Formatter}
*/
public static void splitFile(String inputFilename, long fragmentSize, String outputPattern) throws IOException {
InputStream input = new FileInputStream(inputFilename);
try {
byte[] buffer = new byte[65536];
int outputFileNo = 0;
OutputStream output = null;
long writtenToOutput = 0;
try {
while (true) {
int bytesToRead = buffer.length;
if (bytesToRead > fragmentSize - writtenToOutput) {
bytesToRead = (int) (fragmentSize - writtenToOutput);
}
int bytesRead = input.read(buffer, 0, bytesToRead);
if (bytesRead != -1) {
if (output == null) {
String outputName = printf(outputPattern, outputFileNo);
outputFileNo++;
output = new FileOutputStream(outputName);
writtenToOutput = 0;
}
output.write(buffer, 0, bytesRead);
writtenToOutput += bytesRead;
}
if (output != null && (bytesRead == -1 || writtenToOutput == fragmentSize)) {
output.close();
output = null;
}
if (bytesRead == -1) {
break;
}
}
} finally {
if (output != null) {
output.close();
}
}
} finally {
input.close();
}
}
public static void main(String[] args) throws IOException {
splitFile("d:/backup.zip", 1440 << 10, "d:/backup.zip.part%04d");
}
}
Some remarks:
Only those bytes that have actually been read from the input file are written to one of the output files.
I left out the BufferedInputStream and BufferedOutputStream since their buffer's size is only 8192 bytes, which less than the buffer I use in the code.
As soon as I open a file, I make sure that it will be closed at the end, no matter what happens. (The finally blocks.)
The code contains only one call to input.read and only one call to output.write. This makes it easier to check for correctness.
The code for splitting a file does not catch the IOException, since it doesn't know what to do in such a case. It is just passed to the caller; maybe the caller knows how to handle it.

Both #ratchet and #Voo are correct.
As for what is happening.
int max value is 2,147,483,647 (http://download.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html).
14 gigabytes is 15,032,385,536 which clearly don't fit an int.
See that according to the API Javadoc (http://download.oracle.com/javase/6/docs/api/java/io/BufferedInputStream.html#available%28%29) and as stated by #Voo, this don't break the method contract at all (just isn't what you are looking for).

Java InputStream reading problem

I have a Java class, where I'm reading data in via an InputStream
byte[] b = null;
try {
b = new byte[in.available()];
in.read(b);
} catch (IOException e) {
e.printStackTrace();
}
It works perfectly when I run my app from the IDE (Eclipse).
But when I export my project and it's packed in a JAR, the read command doesn't read all the data. How could I fix it?
This problem mostly occurs when the InputStream is a File (~10kb).
Thanks!

Usually I prefer using a fixed size buffer when reading from input stream. As evilone pointed out, using available() as buffer size might not be a good idea because, say, if you are reading a remote resource, then you might not know the available bytes in advance. You can read the javadoc of InputStream to get more insight.
Here is the code snippet I usually use for reading input stream:
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = 0;
while ((bytesRead = in.read(buffer)) >= 0){
for (int i = 0; i < bytesRead; i++){
//Do whatever you need with the bytes here
}
}
The version of read() I'm using here will fill the given buffer as much as possible and
return number of bytes actually read. This means there is chance that your buffer may contain trailing garbage data, so it is very important to use bytes only up to bytesRead.
Note the line (bytesRead = in.read(buffer)) >= 0, there is nothing in the InputStream spec saying that read() cannot read 0 bytes. You may need to handle the case when read() reads 0 bytes as special case depending on your case. For local file I never experienced such case; however, when reading remote resources, I actually seen read() reads 0 bytes constantly resulting the above code into an infinite loop. I solved the infinite loop problem by counting the number of times I read 0 bytes, when the counter exceed a threshold I will throw exception. You may not encounter this problem, but just keep this in mind :)
I probably will stay away from creating new byte array for each read for performance reasons.

read() will return -1 when the InputStream is depleted. There is also a version of read which takes an array, this allows you to do chunked reads. It returns the number of bytes actually read or -1 when at the end of the InputStream. Combine this with a dynamic buffer such as ByteArrayOutputStream to get the following:
InputStream in = ...
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int read;
byte[] input = new byte[4096];
while ( -1 != ( read = in.read( input ) ) ) {
buffer.write( input, 0, read );
}
input = buffer.toByteArray()
This cuts down a lot on the number of methods you have to invoke and allows the ByteArrayOutputStream to grow its internal buffer faster.

File file = new File("/path/to/file");
try {
InputStream is = new FileInputStream(file);
byte[] bytes = IOUtils.toByteArray(is);
System.out.println("Byte array size: " + bytes.length);
} catch (IOException e) {
e.printStackTrace();
}

Below is a snippet of code that downloads a file (*. Png, *. Jpeg, *. Gif, ...) and write it in BufferedOutputStream that represents the HttpServletResponse.
BufferedInputStream inputStream = bo.getBufferedInputStream(imageFile);
try {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int bytesRead = 0;
byte[] input = new byte[DefaultBufferSizeIndicator.getDefaultBufferSize()];
while (-1 != (bytesRead = inputStream.read(input))) {
buffer.write(input, 0, bytesRead);
}
input = buffer.toByteArray();
response.reset();
response.setBufferSize(DefaultBufferSizeIndicator.getDefaultBufferSize());
response.setContentType(mimeType);
// Here's the secret. Content-Length should equal the number of bytes read.
response.setHeader("Content-Length", String.valueOf(buffer.size()));
response.setHeader("Content-Disposition", "inline; filename=\"" + imageFile.getName() + "\"");
BufferedOutputStream outputStream = new BufferedOutputStream(response.getOutputStream(), DefaultBufferSizeIndicator.getDefaultBufferSize());
try {
outputStream.write(input, 0, buffer.size());
} finally {
ImageBO.close(outputStream);
}
} finally {
ImageBO.close(inputStream);
}
Hope this helps.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: BufferedReader reads more than a line? - java

DataInputStream is most likely what you want to use. Also, don't use the available() method as it is generally useless.

Just a wild stab here - 40000 is 1001110001000000 in binary. Now, the first seven bits here are 1001110 which is 78. Meaning, you're writing 2 bytes of information but reading seven bits.

Related

Java: How to count read bytes from InputStream without allocating the full memory before

I can't get all bytes from website

Fastest way to incrementally read a large file

Java: Issue with available() method of BufferedInputStream

Java InputStream reading problem

Categories

Resources