Hello I have the following code:
int i=12345;
DataOutputStream dos=new DataOutputStream(new FileOutputStream("Raw.txt"));
dos.write(i);
dos.close();
System.out.println(new File("Raw.txt").length());
The file size is being reported as 1 byte. Why is it not 4 bytes when an integer is 4 bytes long?
Thanks
Because you only wrote one byte to it. See the Javadoc for DataOutputStream.write(int). It writes a byte, not an int.
While the DataOutputStream.write method takes an int argument, it actually only writes the bottom 8 bits of that argument. So you actually wrote only one byte ... and hence the file is one byte long.
If you want to write the entire int you should use the writeInt(int) method.
The underlying reason for this strangeness is (I believe) that the write(int) method is defined to be consistent with OutputStream.write(int) which in turn defined to be consistent with InputStream.read(). InputStream.read() reads a byte and returns it as an int ... with the value -1 used to indicate the end-of-stream condition.
Related
read() method returns an int that represents the next byte of data and read(byte[] b) method does not return anything, it assigns the bytes data values to the array passed as an argument.
I have made some tests with an image file and I have taken 2 ways:
Print the results returned by read() method until this result is -1 (what means that the end of the file has been reached).
Create an array of bytes and pass it as an argument of read(byte[] b) method and print the numbers that have been assigned to that array of bytes.
I have noticed that the results in both cases are different: in the second case, as the results are of byte type, the numbers were not greater than 127 or less than -128; while in the first case, i found numbers greater than 200, for example.
Should not the numbers be the same in both cases due to the fact that the file is the same in both cases and those numbers represent the data of that file?
I also used a FileOutputStream to write the data of the file into another new file and in both cases, the new file had the same bytes and look the same (as I said, it was an image).
Thank you.
Since Java has only signed datatypes, read(byte[] b) reads regular bytes, i.e. -128-127. However read() returns an int so it can indicate end of stream with -1, returning unsigned byte values from 0-255.
byte b = (byte)in.read(); // Provided that stream has data left
Would give you an unsigned byte looking like the values you've gotten in your byte[] b.
There is a strange restriction in java.io.DataOutputStream.writeUTF(String str) method, which limits the size of an UTF-8 encoded string to 65535 bytes:
if (utflen > 65535)
throw new UTFDataFormatException(
"encoded string too long: " + utflen + " bytes");
It is strange, because:
there is no any information about this restriction in JavaDoc of this method
this restriction can be easily solved by copying and modifying an internal static int writeUTF(String str, DataOutput out) method of this class
there is no such restriction in the opposite method java.io.DataInputStream.readUTF().
According to the said above I can not understand the purpose of a such restriction in the writeUTF method. What have I missed or misunderstood?
The Javadoc of DataOutputStream.writeUTF states:
First, two bytes are written to the output stream as if by the
writeShort method giving the number of bytes to follow. This value
is the number of bytes actually written out, not the length of the
string.
Two bytes means 16 bits: in 16 bits the maximum integer one can encode is 2^16 == 65535.
DataInputStream.readUTF has the exact same restriction, because it first reads the number of UTF-8 bytes to consume, in the form of a 2-byte integer, which again can only have a maximum value of 65535.
writeUTF first writes two bytes with the length, which has the same result as calling writeShort with the length and then writing the UTF-encoded bytes. writeUTF doesn't actually call writeShort - it builds up a single byte[] with both the 2-byte length and the UTF bytes. But that is why the Javadoc says "as if by the writeShort method" rather than just "by the writeShort method".
From the API the method write(int byte) should take an int representing a byte so in that way it when EOF comes it can return -1.
However it's possible doing the following thing:
FileOutputStream fi = new FileOutputStream(file);
fi.write(100000);
I expected to not compile as the number exceeds the byte range.
How does the JVM interpret it exactly?
Thanks in advance.
From the OutputStream.write(int) doc:
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.
Emphasis mine.
Note that the method takes an int. And since 100000 is a valid integer literal, there is no point of it being not compiling.
Where did you read that part about EOF and -1?
The method just writes one byte, which for some reason is passed along as an int.
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.
I expected to not compile as the number exceeds the byte range
No, this will compile okay. The compiler just looks for an int. (A long would not compile).
Everything except the lowest 8 bits will be ignored.
I have one file created by c++ program which is in encrypted format. I want to read it in my java program. In case of decryption of file contents, decryption algorithm is performing operations on byte[which is unsigned char-BYTE in c/c++]. I used same decryption algorithm which I have used in my c/c++ program. This algorithm contains ^, %, * and - operations on byte. But byte datatype of java is signed because of which I am facing problems in decryption. How can I read file or process read data with 1byte at a time which is unsigned?
thanks in advance.
byte b = <as read from file>;
int i = b & 0xFF;
Perform operations on i as required
The standard method InputStream.read() reads one byte and fits it into a int, so in practice it is an unsinged byte. There are no unsigned primitive data types in java, so the only approach is to fit it in an upper primitive.
That being said you should have no trouble performing encryption/decryption over data bytes read from the file, since the bytes are the same, no matter if they are interpreted as signed or unsigned (0xFF can be 255 or -1). You say the alghorithm contains "^, %, *", etc. That is an interpretation of raw bytes, taking into account a character encoding (that fits 8 bit per character I suppose). You should not perform encryption/decryption operations over other than raw bytes.
First, InputStream.read() returns an int but it holds a byte; it uses an int so -1 can be returned if the EOF is reached. If the int is not -1, you can cast it to byte.
Second, there are read() metods that allow storing the bytes directly in a byte[]
And last, if you are going to use the file as a byte[] (and it is not too big) maybe it would be interesting copying the data from FileInputStream and write it into a ByteArrayOutputStream. You can get the resulting byte[] from the late object (note: do not use the .read() method, use .read(byte[], int, int) for performance).
Since there is no unsigned primitive type in Java, I think what you can do is to convert signed byte into integer (which will virtually be unsigned because the integer will always be positive). You can follow the code in here: Can we make unsigned byte in Java for the conversion.
Why some methods that write bytes/chars to streams takes int instead of byte/char??
Someone told me in case of int instead of char:
because char in java is just 2 bytes length, which is OK with most character symbols already in use, but for certain character symbols (chines or whatever), the character is being represented in more than 2 bytes, and hence we use int instead.
How far this explanation is close to the truth?
EDIT:
I use the stream word to represent Binary and character streams (not Just Binary streams)
Thanks.
Someone told me in case of int instead of char: because char in java is just 2 bytes length, which is OK with most character symbols already in use, but for certain character symbols (chinese or whatever), the character is being represented in more than 2 bytes, and hence we use int instead.
Assuming that at this point you are talking specifically about the Reader.read() method, the statement from "someone" that you have recounted is in fact incorrect.
It is true that some Unicode codepoints have values greater than 65535 and therefore cannot be represented as a single Java char. However, theReader API actually produces a sequence of Java char values (or -1), not a sequence of Unicode codepoints. This clearly stated in the javadoc.
If your input includes a (suitably encoded) Unicode code point that is greater than 65535, then you will actually need to call the read() method twice to see it. What you will get will be a UTF-16 surrogate pair; i.e. two Java char values that together represent the codepoint. In fact, this fits in with the way that the Java String, StringBuilder and StringBuffer classes all work; they all use a UTF-16 based representation ... with embedded surrogate pairs.
The real reason that Reader.read() returns an int not a char is to allow it to return -1 to signal that there are no more characters to be read. The same logic explains why InputStream.read() returns an int not a byte.
Hypothetically, I suppose that the Java designers could have specified that the read() methods throw an exception to signal the "end of stream" condition. However, that would have just replaced one potential source of bugs (failure to test the result) with another (failure to deal with the exception). Besides, exceptions are relatively expensive, and an end of stream is not really an unexpected / exceptional event. In short, the current approach is better, IMO.
(Another clue to the 16 bit nature of the Reader API is the signature of the read(char[], ...) method. How would that deal with codepoints greater than 65535 if surrogate pairs weren't used?)
EDIT
The case of DataOutputStream.writeChar(int) does seem a bit strange. However, the javadoc clearly states that the argument is written as a 2 byte value. And in fact, the implementation clearly writes only the bottom two bytes to the underlying stream.
I don't think that there is a good reason for this. Anyway, there is a bug database entry for this (4957024), which marked as "11-Closed, Not a Defect" with the following comment:
"This isn't a great design or excuse, but it's too baked in for us to change."
... which is kind of an an acknowledgement that it is a defect, at least from the design perspective.
But this is not something worth making a fuss about, IMO.
I'm not sure exactly what you're referring to but perhaps you are thinking of InputStream.read()? It returns an integer instead of a byte because the return value is overloaded to also represent end of stream, which is represented as -1. Since there are 257 different possible return values a byte is insufficient.
Otherwise perhaps you could come with some more specific examples.
There are a few possible explanations.
First, as a couple of people have noted, it might be because read() necessarily returns an int, and so it can be seen as elegant to have write() accept an int to avoid casting:
int read = in.read();
if ( read != -1 )
out.write(read);
//vs
out.write((byte)read);
Second, it might just be nice to avoid other cases of casting:
//write a char (big-endian)
char c;
out.write(c >> 8);
out.write(c);
//vs
out.write( (byte)(c >> 8) );
out.write( (byte)c );
It's correct that the maximum possible code point is 0x10FFFF, which doesn't fit in a char. However, the stream methods are byte-oriented, while the writer methods are 16-bit. OutputStream.write(int) writes a single byte, and Writer.write(int) only looks at the low-order 16 bits.
In Java, Streams are for raw bytes. To write characters, you wrap a Stream in a Writer.
While Writers do have write(int) (which writes the 16 low bits; it's an int because byte is too small, and short is too small due to it being signed), you should be using write(char[]) or write(String) instead.
probably to be symmetric with the read() method which returns an int. nothing serious.