Confusing on Character Streams sample code in Oracle website - java

As I read following from the Oracle website, I get that the int variable holds a character value in its last 16 bits from inputStream.read().
So does it always waste 2 bytes ?
CopyCharacters is very similar to CopyBytes. The most important
difference is that CopyCharacters uses FileReader and FileWriter for
input and output in place of FileInputStream and FileOutputStream.
Notice that both CopyBytes and CopyCharacters use an int variable to
read to and write from. However, in CopyCharacters, the int variable
holds a character value in its last 16 bits; in CopyBytes, the int
variable holds a byte value in its last 8 bits.
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class CopyCharacters {
public static void main(String[] args) throws IOException {
FileReader inputStream = null;
FileWriter outputStream = null;
try {
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (outputStream != null) {
outputStream.close();
}
}
}
}

So does it always waste 2 bytes ?
Ermm ... yes. Either 2 bytes in the Reader case or 3 bytes in the InputStream case.
This wastage is necessary for the following reasons:
Both InputStream.read() and Reader.read() need to return a value to represent the "end of stream". As the javadocs say:
InputStream.read(): Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned.
Reader.read(): Returns the character read, as an integer in the range 0 to 65535 (0x00-0xffff), or -1 if the end of the stream has been reached.
The extra end-of-stream value means that the return type of read() cannot be (respectively) byte or char. (See also the last reason ...)
It turns out that the "wasted" 2 or 3 bytes are of no consequence. Even a trivial Java program is going to use megabytes of memory. (Indeed, even a trivial C program is going to use tens or hundreds of kilobytes of memory ... if you account for the library code that they use.)
Returning a byte or char probably wouldn't save memory anyway. In a typical modern systems, local variables (even byte and char) are stored word aligned on the stack. This is done because accessing memory with a word aligned address is typically faster.
Replacing the -1 with an exception would be inefficient in another way. Throwing and catching exceptions in Java is significantly more expensive than a simple test for -1.

Related

How FileInputStream and FileOutputStream Works in Java?

I'm reading about all input/output streams in java on Java Tutorials Docs. Tutorials writer use this example:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("xanadu.txt");
out = new FileOutputStream("outagain.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
xanadu.txt File data:
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
Output to outagain.txt file:
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
Why do the writers use int c even if we are reading characters?
Why use -1 in while condition?
How out.write(c); method convert int to again characters?
1: Now I want to ask why writer use int c? even we are reading characters.
FileInputStream.read() returns one byte of data as an int. This works because a byte can be represented as an int without loss of precision. See this answer to understand why int is returned instead of byte.
2: The second why use -1 in while condition?
When the end of file is reached, -1 is returned.
3: How out.write(c); method convert int to again characters? that provide same output in outagain.txt file
FileOutputStream.write() takes a byte parameter as an int. Since an int spans over more values than a byte, the 24 high-order bits of the given int are ignored, making it a byte-compatible value: an int in Java is always 32 bits. By removing the 24 high-order bits, you're down to a 8 bits value, i.e. a byte.
I suggest you read carefully the Javadocs for each of those method. As reference, they answer all of your questions:
read:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
write:
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.
Just read the docs.
here is the read method docs
http://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html#read()
public int read()
throws IOException
Reads a byte of data from this input stream. This method blocks if no input is yet available.
Specified by:
read in class InputStream
Returns:
the next byte of data, or -1 if the end of the file is reached.
That int is a your next set of bytes data.
Now , here are the answers.
1) When you assign a char to an int, it denotes it's ascii number to the int.
If you are interested, here us the list of chars and their ascii codes https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html
2)-1 if the end of the file is reached. So that's a check to data exists or not.
3)When you send an ascii code to print writer, it's prints that corresponding char to the file.

Get loading percent from InputStream .read method

I have a problem in parsing an InputStream and getting the loading percentage of his data. I mean, my method needs to parse the InputStream, put it into a StringBuffer, get the total of bytes parsed and returns a String based on my StringBuffer.
private String processPercent(InputStream content, HttpResponse response) throws IOException
{
InputStream in = content;
int totalBytes = Integer.parseInt(response.getFirstHeader("Content-Length").getValue());
int processedByte;
int loaded = 0;
StringBuffer sb = new StringBuffer();
while((processedByte = in.read()) != -1)
{
sb.append((char) processedByte);
if(this.asyncTask instanceof IProgressPercent)
{
lastProcessed = processedByte;
loaded += processedByte;
float percent = ((100*loaded) / totalBytes);
this.progressPercent = (int)percent;
this.asyncTask.doProgress(this.progressPercent);
}
}
in.close();
return new String(sb);
}
The problem is when I display the value of percent variable, I get a value higher than 100, so I think that's a calculation problem.
Any idea ? Thanks.
Replace
loaded += processedByte;
with
++loaded;
before the if actually; incrementing by 1.
You have several problems here.
First of all, you are using an InputStream; an InputStream reads bytes. And a char is not two bytes. A char is a UTF-16 code unit. See here.
Second, you only read byte by byte. If your content length is 1M, you will call this code 1M times. Is this really what you want?
You should change your code in the following ways:
first of all, use an InputStreamReader over your InputStream -- and initiate it with the correct charset;
use this reader's read() method which reads into a preallocated char array; use the number of chars read to update your progress bar;
add the contents of your char[] to your StringBuffer (which should really be a StringBuilder).
Due to the fact that the decoding process will produce less chars than there are actual bytes, this means your counter will be a little off, though; it can be cured using even more sophisticated mechanisms, but this will require quite a bit of code.
Of course, an easier way is, if the size is not too large, download directly all the contents, put it into a ByteArrayOutputStream and then decoding that ByteArrayOutputStream's content into a String.
Instead of
float percent = ((100*loaded) / totalBytes);
use
float percent = 100 * (loaded / totalBytes);

Why that Java I/O use "int" but not "byte"/"char" when read/write a single byte/character? [duplicate]

I've this code:
InputStream is = socket.getInputStream();
int b;
while ((b = is.read()) != -1)
{
System.out.println(b);
}
A byte its range is -128 until +127.
But one of the printed bytes is 210.
Is this the result of converting the read byte to an int?
(So that the negatif byte becomes a positif int)
If so, can I do the same (with an OutputStream) by converting an int to a byte?
Thanks,
Martijn
Actually read returns an integer..
public abstract int read() throws IOException
so it's implictly casted to be unsigned byte by storing it in an int.
As stated in documentation:
Reads the next byte of data from the
input stream. The value byte is
returned as an int in the range 0 to
255. If no byte is available because the end of the stream has been
reached, the value -1 is returned.
Think about the fact that if it's a signed byte then -1 couldn't be used as end of stream value.
For OutputStream you have
public abstract void write(int b) throws IOException
and as stated by documentation implementation will take 8 low order bits of the integer passed..

Why does the character stream read ints?

In the example given in the Oracle Java Tutorial they are trying to read characters as integers... .
Why and how does that work?
try {
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
}
If you read char, there would be no value you could use for end of file.
By using a larger type int, its possible to have every possible character AND another symbol which means end of file.
This is because characters ARE integers. Each character has a unicode equivalent.
Basically a char is an int. Try the following:
char c = 'c';
int i = c;
This will not cause a compile error.
Behind the scenes in java, a char is just a 16-bit unsigned value. An int is a 32-bit unsigned value.
chars are a subset of ints whose values have meaning on the ASCII tables.
Because of this relationship, it is a convenience for syntax to allow the two types to easily converted to the other.
Well, if you read the documentation for Reader/Writer you can see the following explanation:
Writer Class - write Method
Writes a single character. The character to be written is contained
in the 16 low-order bits of the given integer value; the 16
high-order bits are ignored.
And the code simply does:
public void write(int c) throws IOException {
synchronized (lock) {
if (writeBuffer == null){
writeBuffer = new char[writeBufferSize];
}
writeBuffer[0] = (char) c;
write(writeBuffer, 0, 1);
}
}
So, in the case of Writer, and as far as I can see this could have been done with a char data type.
The Reader, on the other hand, int its read method has the responsibility of returning a character or the end of the stream indicator.
The documentation says:
Reader Class read Method
The character read, as an integer in the range 0 to 65535
or -1 if the end of the stream has been reached.
As such, a data type bigger than just a char is needed, and in this case int is used.
And it is implemented as follows:
public int read() throws IOException {
char cb[] = new char[1];
if (read(cb, 0, 1) == -1)
return -1;
else
return cb[0];
}
So, this second case justifies the use of a bigger data type.
The reason why they use an int in both classes could be just a matter of consistency.

Determining and printing file size in Java

The method below returns file size as 2. Since it is long, I'm assuming the file size java calculates is 2*64 bits. But actually I saved a 32 bit int + a 16 bit char = 48 bits. Why does Java do this conversion? Also, does Java implicitly store everything as long in the file no matter if char or int ? How do I get the accurate size of 48 bits ?
public static void main(String[] args)
{
File f = new File("C:/sam.txt");
int a= 42;
char c= '.';
try {
try {
f.createNewFile();
} catch (IOException e) {
e.printStackTrace();
}
PrintWriter pw = new PrintWriter(f);
pw.write(a);
pw.write(c);
pw.close();
System.out.println("file size:"+f.length());
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
No. You wrote two characters. Writers are used for textual data, not for binary data. The documentation of write(int) says:
Writes a single character.
Since the default character encoding of your platform stores those two characters as a single byte (each), the file length is 2 (2 bytes: the length of a file is measured in bytes, as the documentation says). Open the file with a text editor, and see what's in there.
The Java API doc is really useful to know what a class or method does. You should read it.
both calls to write are writing a char, which is 16 bits in memory, but since
new PrintWriter(f)
uses the default character set encoding (probably ASCII or UTF-8 on your system), it results in 2 bytes being written.

Categories