Im trying to read information sent for a client on android using the TCP protocol. In my server I have this code:
InputStream input = clienteSocket.getInputStream();
int c = input.read();
c will containt the ascci number that the client send.
I also can get this by writing:
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
I would like to know what is the difference between both methods.
You're comparing apples and oranges here.
Your first example reads one byte from the stream, unbuffered, and returns the value of that byte. (Adding 'ASCII number' to that adds no actual information.)
Your second example sets up a buffered reader, which can read chars from the stream, buffered, but it doesn't actually read anything.
You could set up two further examples:
InputStream is = new BufferedInputStream(socket.getInputStream());
int c = is.read();
This reads a byte, with buffering.
Reader reader = new InputStreamReader(socket.getInputStream();
int c = reader.read();
This reads a char, with a little buffering: not as much as BufferedReader provides.
The realistic choices are between the two buffered versions, for efficiency reasons as outlined by #StephenC, and the choice between them is dictated by whether you want bytes or chars.
The buffered approach is better because (in most cases) reduces the number of syscalls that the JVM needs to make to the operating system. Since syscalls are relatively expensive, buffering generally gives you better performance.
In your specific example:
Each time you call c.read() on an input stream you do a syscall.
The first time you do a c.read() (or other read operation) on a buffered input stream, it reads a number of bytes into an in-memory byte-array. In second, third, etc calls to c.read(), the read will typically return a byte out of the in-memory buffer, without making a syscall.
In your example, the only case where using a buffered stream doesn't help would be if you are going to read only one byte from the socket, and then close it.
UPDATE
I didn't notice that you were comparing an unbuffered InputStream with a buffered >> Reader <<. As #EJP, points out, this is "comparing Apples and Oranges". The functionality of the two versions is different. One reades bytes and the other reads characters.
(And if you don't understand that distinction ... and why it is an important distinction ... you would be advised to read the Java Tutorial lesson on Basic I/O. Particularly the sections on byte streams, character streams and buffered streams.)
Related
if the read method reads a byte of data from the input stream,
when it has to read a char, does it read twice byte by byte? as a char is of 2 byes?
InputStream operates on bytes. It is the underlying I/O abstraction in Java. It can read a single byte or a sequence of bytes, depending on what the caller requests. But it knows nothing about characters, so it cannot, by itself, decide to read two bytes for a character. A Reader would have to request this.
If you need to read characters, use Reader to read them from the InputStream.
(Similarly, to read serialized Java objects, you would use ObjectInputStream, which again reads them from the InputStream. Or you can use Scanner to read a variety of inputs from numbers to text, again from an InputStream.)
The purpose of this abstraction is separation of responsibilities -
The InputStream provides a stream of bytes and handles all underlying logic (file reading / network / ...).
The Reader converts the stream of bytes to stream of characters, and doesn't care where the data came from.
As per Oracle documentation available online https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read() it reads Byte Array
I'm having trouble understanding how do buffers work in Java IO.
Excuse me if I don't express myself as clearly as I would like, I'm not
strong on all these concepts.
As I undestand it, in Java there are readers/writers, for chars (meaning the
possibility of more than one byte per char and encoding options), and streams
for bytes.
Then there are some classes that use buffers.
I believe that a buffer is used mainly so that we can avoid unnecessary system
calls that would involve expensive operations, like accesing a slower device, by
storing more in memory and making the system call useful for more data.
The issue I have is that there seem to be buffering classes for both readers/writers and streams.
I would like to know if buffering characters is enough, or if, by the time those
bytes get to the streaming class, they would be flushed on for example newlines,
as some classes state.
The best I've found about this is this post
How do these different code snippets compare in regard to buffering?
Does autoflush thwart the intent of buffering?
Should there be only one buffer in play, and if so, where (writer or stream)
and why?:
// resolveSocket is some socket
PrintWriter out = new PrintWriter(
resolveSocket.getOutputStream(),
true);
// vs
PrintWriter out = new PrintWriter(
new OutputStreamWriter(
new BufferedOutputStream(resolveSocket.getOutputStream()),
StandardCharsets.UTF_8),
true)
My interest is first and foremost to understand buffering better, and practical only after that.
Thak you all in advance.
EDIT: Found this other stack overflow question interesting and related.
Also this other one talks about BufferedOutputStream.
It may help you to understand the difference between a writer and a stream. An OutputStream is a binary sink. A Writer has a character encoding and understands about newlines. The OutputStreamWriter allows you to send character encoded data, and have it correctly translated to binary for consumption by an OutputStream. What you probably want is:
PrintWriter out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(resolveSocket.getOutputStream())));
This will allow you to use your PrintWriter to output chars, have it buffered by the BufferedWriter, and then translated by the OutputStreamWriter for consumption by the socket output stream.
Sorry if this question is a dulplicate but I didn't get an answer I was looking for.
Java docs says this
In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders >and InputStreamReaders. For example,
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.
My first question is If bufferedReader can read bytes then why can't we work on images which are in bytes using bufferedreader.
My second question is Does Bufferedreader store characters in BUFFER and what is the meaning of this line
will buffer the input from the specified file.
My third question is what is the meaning of this line
In general, each read request made of a Reader causes a corresponding read request to be >made of the underlying character or byte stream.
There are two questions here.
1. Buffering
Imagine you lived a mile from your nearest water source, and you drink a cup of water every hour. Well, you wouldn't walk all the way to the water for every cup. Go once a day, and come home with a bucket full of enough water to fill the cup 24 times.
The bucket is a buffer.
Imagine your village is supplied water by a river. But sometimes the river runs dry for a month; other times the river brings so much water that the village floods. So you build a dam, and behind the dam there is a reservoir. The reservoir fills up in the rainy season and gradually empties in the dry season. The village gets a steady flow of water all year round.
The reservoir is a buffer.
Data streams in computing are similar to both those scenarios. For example, you can get several kilobytes of data from a filesystem in a single OS system call, but if you want to process one character at a time, you need something similar to a reservoir.
A BufferedReader contains within it another Reader (for example a FileReader), which is the river -- and an array of bytes, which is the reservoir. Every time you read from it, it does something like:
if there are not enough bytes in the "reservoir" to fulfil this request
top up the "reservoir" by reading from the underlying Reader
endif
return some bytes from the "reservoir".
However when you use a BufferedReader, you don't need to know how it works, only that it works.
2. Suitability for images
It's important to understand that BufferedReader and FileReader are examples of Readers. You might not have covered polymorphism in your programming education yet, so when you do, remember this. It means that if you have code which uses FileReader -- but only the aspects of it that conform to Reader -- then you can substitute a BufferedReader and it will work just the same.
It's a good habit to declare variables as the most general class that works:
Reader reader = new FileReader(file);
... because then this would be the only change you need to add buffering:
Reader reader = new BufferedReader(new FileReader(file));
I took that detour because it's all Readers that are less suitable for images.
Reader has two read methods:
int read(); // returns one character, cast to an int
int read(char[] block); // reads into block, returns how many chars it read
The second form is unsuitable for images because it definitely reads chars, not ints.
The first form looks as if it might be OK -- after all, it reads ints. And indeed, if you just use a FileReader, it might well work.
However, think about how a BufferedReader wrapped around a FileReader will work. The first time you call BufferedReader.read(), it will call FileReader.read(buffer) to fill its buffer. Then it will cast the first char of the buffer back to an int, and return that.
Especially when you bring multi-byte charsets into the picture, that can cause problems.
So if you want to read integers, use InputStream not Reader. InputStream has int read(byte[] buf, int offset, int length) -- bytes are much more reliably cast back and forth from int than chars.
Readers (and Writers) in java are specialized classes for dealing with text (character) streams - the concept of a line is meaningless in any other type of stream.
for the general IO equivalent, have a look at BufferedInputStream
so, to answer your questions:
while the reader does eventually read bytes, it converts them to characters. it is not intended to read anything else (like images) - use the InputStream family of classes for that
a buffered reader will read large blocks of data from the underlying stream (which may be a file, socket, or anything else) into a buffer in memory and will then serve read requests from this buffer until the buffer is emptied. this behaviour of reading large chunks instead of smaller chucks every time improves performance.
it means that if you dont wrap a reader in a buffered reader then every time you want to read a single character, it will access the disk.network to get just the single character you want. doing I/O in such small chunks is usually terrible for performance.
Default behaviour is it will convert to character, but when you have an image you cannot have a character data, instead you need pixel of bytes data. So you cannot use it.
It is buffereing, means , it is reading a certain chunk of data in an char array. You can see this behaviour in the code:
public BufferedReader(Reader in) {
this(in, defaultCharBufferSize);
}
and the defaultCharBufferSize is as mentioned below:
private static int defaultCharBufferSize = 8192;
3 Every time you do read operation, it will be reading only one character.
So in a nutshell, buffred means, it will read few chunk of character data first that will keep in a char array and that will be processed and again it will read same chunk of data until it reaches end of stream
You can refer the following to get to know more
BufferedReader
I am working on a project and have a question about Java sockets. The source file which can be found here.
After successfully transmitting the file size in plain text I need to transfer binary data. (DVD .Vob files)
I have a loop such as
// Read this files size
long fileSize = Integer.parseInt(in.readLine());
// Read the block size they are going to use
int blockSize = Integer.parseInt(in.readLine());
byte[] buffer = new byte[blockSize];
// Bytes "red"
long bytesRead = 0;
int read = 0;
while(bytesRead < fileSize){
System.out.println("received " + bytesRead + " bytes" + " of " + fileSize + " bytes in file " + fileName);
read = socket.getInputStream().read(buffer);
if(read < 0){
// Should never get here since we know how many bytes there are
System.out.println("DANGER WILL ROBINSON");
break;
}
binWriter.write(buffer,0,read);
bytesRead += read;
}
I read a random number of bytes close to 99%. I am using Socket, which is TCP based,
so I shouldn't have to worry about lower layer transmission errors.
The received number changes but is always very near the end
received 7258144 bytes of 7266304 bytes in file GLADIATOR/VIDEO_TS/VTS_07_1.VOB
The app then hangs there in a blocking read. I am confounded. The server is sending the correct
file size and has a successful implementation in Ruby but I can't get the Java version to work.
Why would I read less bytes than are sent over a TCP socket?
The above is because of a bug many of you pointed out below.
BufferedReader ate 8Kb of my socket's input. The correct implementation can be found
Here
If your in is a BufferedReader then you've run into the common problem with buffering more than needed. The default buffer size of BufferedReader is 8192 characters which is approximately the difference between what you expected and what you got. So the data you are missing is inside BufferedReader's internal buffer, converted to characters (I wonder why it didn't break with some kind of conversion error).
The only workaround is to read the first lines byte-by-byte without using any buffered classes readers. Java doesn't provide an unbuffered InputStreamReader with readLine() capability as far as I know (with the exception of the deprecated DataInputStream.readLine(), as indicated in the comments below), so you have to do it yourself. I would do it by reading single bytes, putting them into a ByteArrayOutputStream until I encounter an EOL, then converting the resulting byte array into a String using the String constructor with the appropriate encoding.
Note that while you can't use a BufferedInputReader, nothing stops you from using a BufferedInputStream from the very beginning, which will make byte-by-byte reads more efficient.
Update
In fact, I am doing something like this right now, only a bit more complicated. It is an application protocol that involves exchanging some data structures that are nicely represented in XML, but they sometimes have binary data attached to them. We implemented this by having two attributes in the root XML: fragmentLength and isLastFragment. The first one indicates how much bytes of binary data follow the XML part and isLastFragment is a boolean attribute indicating the last fragment so the reading side knows that there will be no more binary data. XML is null-terminated so we don't have to deal with readLine(). The code for reading looks like this:
InputStream ins = new BufferedInputStream(socket.getInputStream());
while (!finished) {
ByteArrayOutputStream buf = new ByteArrayOutputStream();
int b;
while ((b = ins.read()) > 0) {
buf.write(b);
}
if (b == -1)
throw new EOFException("EOF while reading from socket");
// b == 0
Document xml = readXML(new ByteArrayInputStream(buf.toByteArray()));
processAnswers(xml);
Element root = xml.getDocumentElement();
if (root.hasAttribute("fragmentLength")) {
int length = DatatypeConverter.parseInt(
root.getAttribute("fragmentLength"));
boolean last = DatatypeConverter.parseBoolean(
root.getAttribute("isLastFragment"));
int read = 0;
while (read < length) {
// split incoming fragment into 4Kb blocks so we don't run
// out of memory if the client sent a really large fragment
int l = Math.min(length - read, 4096);
byte[] fragment = new byte[l];
int pos = 0;
while (pos < l) {
int c = ins.read(fragment, pos, l - pos);
if (c == -1)
throw new EOFException(
"Preliminary EOF while reading fragment");
pos += c;
read += c;
}
// process fragment
}
Using null-terminated XML for this turned out to be a really great thing as we can add additional attributes and elements without changing the transport protocol. At the transport level we also don't have to worry about handling UTF-8 because XML parser will do it for us. In your case you're probably fine with those two lines, but if you need to add more metadata later you may wish to consider null-terminated XML too.
Here is your problem. The first few lines of the program your using in.readLine() which is probably some sort of BufferedReader. BufferedReaders will read data off the socket in 8K chunks. So when you did the first readLine() it read the first 8K into the buffer. The first 8K contains your two numbers followed by newlines, then some portion of the head of the VOB file (that's the missing chunk). Now when you switched to using the getInputStream() off the socket you are 8K into the transmission assuming your starting at zero.
socket.getInputStream().read(buffer); // you can't do this without losing data.
While the BufferedReader is nice for reading character data, switching between binary and character data in a stream is not possible with it. You'll have to switch to using InputStream instead of Reader and convert the first few portions by hand to character data. If you read the file using a buffered byte array you can read the first chunk, look for your newlines and convert everything to the left of that to character data. Then write everything to the right to your file, then start reading the rest of the file.
This used to be easier with DataInputStream, but it doesn't do a good job handling character conversion for you (readLine is deprecated with BufferedReader being the only replacement - doh). Probably should write a DataInputStream replacement that under the covers uses Charset to properly handle string conversion. Then switching between characters and binary would be easier.
Your basic problem is that BufferedReader will read as much data is available and place in its buffer. It will give you the data as you ask for it. This is the whole point of buffereing i.e. to reduce the number of calls to the OS. The only safe way to use an buffered input is to use the same buffer over the life of the connection.
In your case, you only use the buffer to read two lines, however it is highly likely that 8192 bytes has been read into the buffer. (The default size of the buffer) Say the first two lines consist of 32 bytes, this leaves 8160 waiting for you to read, however you by-pass the buffer to perform the read() on the socket directly leading to 8160 bytes left in the buffer you end up discarding. (the amount you are missing)
BTW: You should be able to see this in a debugger if you inspect the contents of your buffered reader.
Sergei may have been right about data being lost inside the buffer, but I'm not sure about his explanation. (BufferedReaders don't usually hold onto data inside their buffers. He may be thinking of a problem with BufferedWriters, which can lose data if the underlying stream is shut down prematurely.) [Never mind; I had misread Sergei's answer. The rest of this is valid AFAIK.]
I think you have a problem that's specific to your application. In your client code, you start reading as follows:
public static void recv(Socket socket){
try {
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
//...
int numFiles = Integer.parseInt(in.readLine());
... and you proceed to use in for the start of the exchange. But then you switch to using the raw socket stream:
while(bytesRead > fileSize){
read = socket.getInputStream().read(buffer);
Because in is a BufferedReader, it's already going to have filled its buffer with up to 8192 bytes from the socket input stream. Any bytes that are in that buffer, and which you don't read from in, will be lost. Your app is hanging because it believes that the server is holding onto some bytes, but the server doesn't have them.
The solution is not to do byte-by-byte reads from the socket (ouch! your poor CPU!), but to use the BufferedReader consistently. Or, to use buffering with binary data, change the BufferedReader to a BufferedInputStream that wraps the socket's InputStream.
By the way, TCP is not as reliable as many people assume it to be. For example, when the server socket closes, it's possible for it to have written data into the socket which then gets lost as the socket connection is shutdown. Calling Socket.setSoLinger can help to prevent this problem.
EDIT: Also BTW, you're playing with fire by treating byte and character data as if they're interchangeable, as you do below. If the data really is binary, then the conversion to String risks corrupting the data. Perhaps you want to be writing into a BufferedOutputStream?
// Java is retarded and reading and writing operate with
// fundamentally different types. So we write a String of
// binary data.
fileWriter.write(new String(buffer));
bytesRead += read;
EDIT 2: Clarified (or attempted to clarify :-} the handling of binary vs. String data.
I want to read each line from a text file and store them in an ArrayList (each line being one entry in the ArrayList).
So far I understand that a BufferedInputStream writes to the buffer and only does another read once the buffer is empty which minimises or at least reduces the amount of operating system operations.
Am I correct - do I make sense?
If the above is the case in what situations would anyone want to use DataInputStream. And finally which of the two should I be using and why - or does it not matter.
Use a normal InputStream (e.g. FileInputStream) wrapped in an InputStreamReader and then wrapped in a BufferedReader - then call readLine on the BufferedReader.
DataInputStream is good for reading primitives, length-prefixed strings etc.
The two classes are not mutually exclusive - you can use both of them if your needs suit.
As you picked up, BufferedInputStream is about reading in blocks of data rather than a single byte at a time. It also provides the convenience method of readLine(). However, it's also used for peeking at data further in the stream then rolling back to a previous part of the stream if required (see the mark() and reset() methods).
DataInputStream/DataOutputStream provides convenience methods for reading/writing certain data types. For example, it has a method to write/read a UTF String. If you were to do this yourself, you'd have to decide on how to determine the end of the String (i.e. with a terminator byte or by specifying the length of the string).
This is different from BufferedInputStream's readLine() which, as the method sounds like, only returns a single line. writeUTF()/readUTF() deal with Strings - that string can have as many lines it it as it wants.
BufferedInputStream is suitable for most text processing purposes. If you're doing something special like trying to serialize the fields of a class to a file, you'd want to use DataInput/OutputStream as it offers greater control of the data at a binary level.
Hope that helps.
You can always use both:
final InputStream inputStream = ...;
final BufferedInputStream bufferedInputStream =
new BufferedInputStream(inputStream);
final DataInputStream dataInputStream =
new DataInputStream(bufferedInputStream);
InputStream: Base class to read byte from stream (network or file ), provide ability to read byte from the stream and delete the end of the stream.
DataInputStream: To read data directly as a primitive datatype.
BufferInputStream: Read data from the input stream and use buffer to optimize the speed to access the data.
You shoud use DataInputStream in cases when you need to interpret the primitive types in a file written by a language other Java in platform-independent manner.
I would advocate using Jakarta Commons IO and the readlines() method (of whatever variety).
It'll look after buffering/closing etc. and give you back a list of text lines. I'll happily roll my own input stream wrapping with buffering etc., but nine times out of ten the Commons IO stuff works fine and is sufficient/more concise/less error prone etc.
The differences are:
The DataInputStream works with the binary data, while the BufferedReader work with character data.
All primitive data types can be handled by using the corresponding methods in DataInputStream class, while only string data can be read from BufferedReader class and they need to be parsed into the respective primitives.
DataInputStream is a part of filtered streams, while BufferedReader is not.
DataInputStream consumes less amount of memory space being it is a binary stream, whereas BufferedReader consumes more memory space being it is character stream.
The data to be handled is limited in DataInputStream, whereas the number of characters to be handled has wide scope in BufferedReader.