I'm reading about all input/output streams in java on Java Tutorials Docs. Tutorials writer use this example:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("xanadu.txt");
out = new FileOutputStream("outagain.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
xanadu.txt File data:
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
Output to outagain.txt file:
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
Why do the writers use int c even if we are reading characters?
Why use -1 in while condition?
How out.write(c); method convert int to again characters?
1: Now I want to ask why writer use int c? even we are reading characters.
FileInputStream.read() returns one byte of data as an int. This works because a byte can be represented as an int without loss of precision. See this answer to understand why int is returned instead of byte.
2: The second why use -1 in while condition?
When the end of file is reached, -1 is returned.
3: How out.write(c); method convert int to again characters? that provide same output in outagain.txt file
FileOutputStream.write() takes a byte parameter as an int. Since an int spans over more values than a byte, the 24 high-order bits of the given int are ignored, making it a byte-compatible value: an int in Java is always 32 bits. By removing the 24 high-order bits, you're down to a 8 bits value, i.e. a byte.
I suggest you read carefully the Javadocs for each of those method. As reference, they answer all of your questions:
read:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
write:
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.
Just read the docs.
here is the read method docs
http://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html#read()
public int read()
throws IOException
Reads a byte of data from this input stream. This method blocks if no input is yet available.
Specified by:
read in class InputStream
Returns:
the next byte of data, or -1 if the end of the file is reached.
That int is a your next set of bytes data.
Now , here are the answers.
1) When you assign a char to an int, it denotes it's ascii number to the int.
If you are interested, here us the list of chars and their ascii codes https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html
2)-1 if the end of the file is reached. So that's a check to data exists or not.
3)When you send an ascii code to print writer, it's prints that corresponding char to the file.
Related
As I read following from the Oracle website, I get that the int variable holds a character value in its last 16 bits from inputStream.read().
So does it always waste 2 bytes ?
CopyCharacters is very similar to CopyBytes. The most important
difference is that CopyCharacters uses FileReader and FileWriter for
input and output in place of FileInputStream and FileOutputStream.
Notice that both CopyBytes and CopyCharacters use an int variable to
read to and write from. However, in CopyCharacters, the int variable
holds a character value in its last 16 bits; in CopyBytes, the int
variable holds a byte value in its last 8 bits.
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class CopyCharacters {
public static void main(String[] args) throws IOException {
FileReader inputStream = null;
FileWriter outputStream = null;
try {
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (outputStream != null) {
outputStream.close();
}
}
}
}
So does it always waste 2 bytes ?
Ermm ... yes. Either 2 bytes in the Reader case or 3 bytes in the InputStream case.
This wastage is necessary for the following reasons:
Both InputStream.read() and Reader.read() need to return a value to represent the "end of stream". As the javadocs say:
InputStream.read(): Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned.
Reader.read(): Returns the character read, as an integer in the range 0 to 65535 (0x00-0xffff), or -1 if the end of the stream has been reached.
The extra end-of-stream value means that the return type of read() cannot be (respectively) byte or char. (See also the last reason ...)
It turns out that the "wasted" 2 or 3 bytes are of no consequence. Even a trivial Java program is going to use megabytes of memory. (Indeed, even a trivial C program is going to use tens or hundreds of kilobytes of memory ... if you account for the library code that they use.)
Returning a byte or char probably wouldn't save memory anyway. In a typical modern systems, local variables (even byte and char) are stored word aligned on the stack. This is done because accessing memory with a word aligned address is typically faster.
Replacing the -1 with an exception would be inefficient in another way. Throwing and catching exceptions in Java is significantly more expensive than a simple test for -1.
When reading from a file using readChar() in RandomAccessFile class, unexpected output comes.
Instead of the desired character ? is displayed.
package tesr;
import java.io.RandomAccessFile;
import java.io.IOException;
public class Test {
public static void main(String[] args) {
try{
RandomAccessFile f=new RandomAccessFile("c:\\ankit\\1.txt","rw");
f.seek(0);
System.out.println(f.readChar());
}
catch(IOException e){
System.out.println("dkndknf");
}
// TODO Auto-generated method stub
}
}
You probably intended readByte. Java char is UTF-16BE, a 2 bytes Unicode representation, and on random binary data very often not representable, no correct UTF-16BE or a half "surrogate" - part of a combination of two char forming one Unicode code point. Java represents a failed conversion in your case as question mark.
If you know in what encoding the file is in, then for a single byte encoding it is simple:
byte b = in.readByte();
byte[] bs = new byte[] { b };
String s = new String(bs, "Cp1252"); // Some single byte encoding
For the variable multi-byte UTF-8 it is also simple to identify a sequence of bytes:
single byte when high bit = 0
otherwise a continuation byte when high bits 10
otherwise a starting byte (with some special cases) telling the number of bytes by its high bits.
For UTF-16LE and UTF-16BE the file positions must be a multiple of 2 and 2 bytes long.
byte[] bs = new byte[2];
in.read(bs);
String s = new String(bs, StandardCharsets.UTF_16LE);
You almost certainly have a character encoding problem. It is not possible to simply read characters from a file. What must be done is that an appropriate sequence of bytes are read, then those bytes are interpreted according to a character encoding scheme to translate them to a character. When you want to read a file as text, Java must be told, perhaps implicitly, which character encoding to use.
If you tell Java the wrong encoding you will get gibberish. If you pick an arbitrary point in a file and start reading, and that location is not the start of the encoding of a character, you will get gibberish. One or both of those has happened in your case.
I've this code:
InputStream is = socket.getInputStream();
int b;
while ((b = is.read()) != -1)
{
System.out.println(b);
}
A byte its range is -128 until +127.
But one of the printed bytes is 210.
Is this the result of converting the read byte to an int?
(So that the negatif byte becomes a positif int)
If so, can I do the same (with an OutputStream) by converting an int to a byte?
Thanks,
Martijn
Actually read returns an integer..
public abstract int read() throws IOException
so it's implictly casted to be unsigned byte by storing it in an int.
As stated in documentation:
Reads the next byte of data from the
input stream. The value byte is
returned as an int in the range 0 to
255. If no byte is available because the end of the stream has been
reached, the value -1 is returned.
Think about the fact that if it's a signed byte then -1 couldn't be used as end of stream value.
For OutputStream you have
public abstract void write(int b) throws IOException
and as stated by documentation implementation will take 8 low order bits of the integer passed..
In the example given in the Oracle Java Tutorial they are trying to read characters as integers... .
Why and how does that work?
try {
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
}
If you read char, there would be no value you could use for end of file.
By using a larger type int, its possible to have every possible character AND another symbol which means end of file.
This is because characters ARE integers. Each character has a unicode equivalent.
Basically a char is an int. Try the following:
char c = 'c';
int i = c;
This will not cause a compile error.
Behind the scenes in java, a char is just a 16-bit unsigned value. An int is a 32-bit unsigned value.
chars are a subset of ints whose values have meaning on the ASCII tables.
Because of this relationship, it is a convenience for syntax to allow the two types to easily converted to the other.
Well, if you read the documentation for Reader/Writer you can see the following explanation:
Writer Class - write Method
Writes a single character. The character to be written is contained
in the 16 low-order bits of the given integer value; the 16
high-order bits are ignored.
And the code simply does:
public void write(int c) throws IOException {
synchronized (lock) {
if (writeBuffer == null){
writeBuffer = new char[writeBufferSize];
}
writeBuffer[0] = (char) c;
write(writeBuffer, 0, 1);
}
}
So, in the case of Writer, and as far as I can see this could have been done with a char data type.
The Reader, on the other hand, int its read method has the responsibility of returning a character or the end of the stream indicator.
The documentation says:
Reader Class read Method
The character read, as an integer in the range 0 to 65535
or -1 if the end of the stream has been reached.
As such, a data type bigger than just a char is needed, and in this case int is used.
And it is implemented as follows:
public int read() throws IOException {
char cb[] = new char[1];
if (read(cb, 0, 1) == -1)
return -1;
else
return cb[0];
}
So, this second case justifies the use of a bigger data type.
The reason why they use an int in both classes could be just a matter of consistency.
I've this code:
InputStream is = socket.getInputStream();
int b;
while ((b = is.read()) != -1)
{
System.out.println(b);
}
A byte its range is -128 until +127.
But one of the printed bytes is 210.
Is this the result of converting the read byte to an int?
(So that the negatif byte becomes a positif int)
If so, can I do the same (with an OutputStream) by converting an int to a byte?
Thanks,
Martijn
Actually read returns an integer..
public abstract int read() throws IOException
so it's implictly casted to be unsigned byte by storing it in an int.
As stated in documentation:
Reads the next byte of data from the
input stream. The value byte is
returned as an int in the range 0 to
255. If no byte is available because the end of the stream has been
reached, the value -1 is returned.
Think about the fact that if it's a signed byte then -1 couldn't be used as end of stream value.
For OutputStream you have
public abstract void write(int b) throws IOException
and as stated by documentation implementation will take 8 low order bits of the integer passed..