Determining and printing file size in Java

Determining and printing file size in Java - java

The method below returns file size as 2. Since it is long, I'm assuming the file size java calculates is 2*64 bits. But actually I saved a 32 bit int + a 16 bit char = 48 bits. Why does Java do this conversion? Also, does Java implicitly store everything as long in the file no matter if char or int ? How do I get the accurate size of 48 bits ?
public static void main(String[] args)
{
File f = new File("C:/sam.txt");
int a= 42;
char c= '.';
try {
try {
f.createNewFile();
} catch (IOException e) {
e.printStackTrace();
}
PrintWriter pw = new PrintWriter(f);
pw.write(a);
pw.write(c);
pw.close();
System.out.println("file size:"+f.length());
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}

No. You wrote two characters. Writers are used for textual data, not for binary data. The documentation of write(int) says:
Writes a single character.
Since the default character encoding of your platform stores those two characters as a single byte (each), the file length is 2 (2 bytes: the length of a file is measured in bytes, as the documentation says). Open the file with a text editor, and see what's in there.
The Java API doc is really useful to know what a class or method does. You should read it.

both calls to write are writing a char, which is 16 bits in memory, but since
new PrintWriter(f)
uses the default character set encoding (probably ASCII or UTF-8 on your system), it results in 2 bytes being written.

Related

The readChar() method displays japanese character

I'm trying to write a code that pick-up a word from a file according to an index entered by the user but the problem is that the method readChar() from the RandomAccessFile class is returning japanese characters, I must admit that it's not the first time that I've seen this on my lenovo laptop , sometimes on some installation wizards I can see mixed stuff with normal characters mixed with japanese characters, do you think it comes from the laptop or rather from the code?
This is the code:
package com.project;
import java.io.*;
import java.util.StringTokenizer;
public class Main {
public static void main(String[] args) throws IOException {
int N, i=0;
char C;
char[] charArray = new char[100];
String fileLocation = "file.txt";
BufferedReader buffer = new BufferedReader(new InputStreamReader(System.in));
do {
System.out.println("enter the index of the word");
N = Integer.parseInt(buffer.readLine());
if (N!=0) {
RandomAccessFile word = new RandomAccessFile(new File(fileLocation), "r");
do {
word.seek((2*(N-1))+i);
C = word.readChar();
charArray[i] = C;
i++;
}while(charArray[i-1] != ' ');
System.out.println("the word of index " + N + " is: " );
for (char carTemp : charArray )
System.out.print(carTemp);
System.out.print("\n");
}
}while(N!=0);
buffer.close();
}
}
i get this output :
瑯潕啰灰灥敲牃䍡慳獥攨⠩⤍ഊੴ瑯潌䱯潷睥敲牃䍡慳獥攨⠩⤍ഊ੣捯潭浣捡慴琨⡓却瑲物楮湧朩⤍ഊ੣捨桡慲牁䅴琨⡩楮湴琩⤍ഊੳ獵畢扳獴瑲物楮湧木⠠⁳獴瑡慲牴琠⁩楮湤摥數砬Ⱐ⁥敮湤搠⁩楮湤摥數砩⤍ഊੴ瑲物業洨⠩Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 100 out of bounds for length 100
at Main.main(Main.java:21)

There are many things wrong, all of which have to do with fundamental misconceptions.
First off: A file on your disk - never mind the File interface in Java, or any other programming language; the file itself - does not and cannot store text. Ever. It stores bytes. That is, raw data, as (on every machine that's been relevant for decades, but historically there have been other ways to do it) quantified in bits, which are organized into groups of 8 that are called bytes.
Text is an abstraction; an interpretation of some particular sequence of byte values. It depends - fundamentally and unavoidably - on an encoding. Because this isn't a blog, I'll spare you the history lesson here, but suffice to say that Java's char type does not simply store a character of text. It stores an unsigned two-byte value, which may represent a character of text. Because there are more characters of text in Unicode than two bytes can represent, sometimes two adjacent chars in an array are required to represent a character of text. (And, of course, there is probably code out there that abuses the char type simply because someone wanted an unsigned equivalent of short. I may even have written some myself. That era is a blur for me.)
Anyway, the point is: using .readChar() is going to read two bytes from your file, and store them into a char within your char[], and the corresponding numeric value is not going to be anything like the one you wanted - unless your file happens to be encoded using the same encoding that Java uses natively, called UTF-16.
You cannot properly read and interpret the file without knowing the file encoding. Full stop. You can at best delude yourself into believing that you can read it. You also cannot have "random access" to a text file - i.e., indexing according to a number of characters of text - unless the encoding in question is constant width. (Otherwise, of course, you can't just calculate the distance-in-bytes into the file where a given character of text is; it depends on how many bytes the previous characters took up, which depends on which characters they are.) Many text encodings are not constant width. One of the most popular, which frankly is the sane default recommendation for most tasks these days, is not. In which case you are simply out of luck for the problem you describe.
At any rate, once you know the encoding of your file, the expected way to retrieve a character of text from a file in Java is to use one of the Reader classes, such as InputStreamReader:
An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset. The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.
(Here, charset simply means an instance of the class that Java uses to represent text encodings.)
You may be able to fudge your problem description a little bit: seek to a byte offset, and then grab the text characters starting at that offset. However, there is no guarantee that the "text characters starting at that offset" make any sense, or in fact can be decoded at all. If the offset happens to be in the middle of a multi-byte encoding for a character, the remaining part isn't necessarily valid encoded text.

char is 16 bits, i.e. 2 bytes.
seek seeks to a byte boundary.
If the file contains chars then they are at even offsets: 0, 2, 4...
The expression (2*(N-1))+i) is even iff i is even; if odd, you are sure to land in the middle of a char, and thus read garbage.
i starts at zero, but you increment by 1, i.e., half a character.
Your seek argument should probably be (2*(N-1+i)).
Alternative explanation: your file does not contain chars at all; for example, you created an ASCII file in which a character is a single byte.
In that case, the error is attempting to read ASCII (an obsolete character encoding) with a readChar function.
But if the file contains ASCII, the purpose of multiplying by 2 in the seek argument is obscure. It apparently serves no useful purpose.

I changed the encoding of the file to UTF-16 and modified the programe in order to display the right indexes, those that represents the beginning of each word, now it works fine, Thank you guys.
import java.io.*;
public class Main {
public static void main(String[] args) throws IOException {
int N, i=0, j=0, k=0;
char C;
char[] charArray = new char[100];
String fileLocation = "file.txt";
BufferedReader buffer = new BufferedReader(new InputStreamReader(System.in));
DataInputStream in = new DataInputStream(new FileInputStream(fileLocation));
boolean EOF=false;
do {
try {
j++;
C = in.readChar();
if((C==' ')||(C=='\n')){
System.out.print(j+1+"\t");
}
}catch (IOException e){
EOF=true;
}
}while (EOF!=true);
System.out.println("\n");
do {
System.out.println("enter the index of the word");
N = Integer.parseInt(buffer.readLine());
if (N!=0) {
RandomAccessFile word = new RandomAccessFile(new File(fileLocation), "r");
do {
word.seek((2*(N-1+i)));
C = word.readChar();
charArray[i] = C;
i++;
}while(charArray[i-1] != ' ' && charArray[i-1] != '\n');
System.out.print("the word of index " + N + " is: " );
for (char carTemp : charArray )
System.out.print(carTemp);
System.out.print("\n");
i=0;
charArray = new char[100];
}
}while(N!=0);
buffer.close();
}
}

How FileInputStream and FileOutputStream Works in Java?

I'm reading about all input/output streams in java on Java Tutorials Docs. Tutorials writer use this example:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("xanadu.txt");
out = new FileOutputStream("outagain.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
xanadu.txt File data:
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
Output to outagain.txt file:
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
Why do the writers use int c even if we are reading characters?
Why use -1 in while condition?
How out.write(c); method convert int to again characters?

1: Now I want to ask why writer use int c? even we are reading characters.
FileInputStream.read() returns one byte of data as an int. This works because a byte can be represented as an int without loss of precision. See this answer to understand why int is returned instead of byte.
2: The second why use -1 in while condition?
When the end of file is reached, -1 is returned.
3: How out.write(c); method convert int to again characters? that provide same output in outagain.txt file
FileOutputStream.write() takes a byte parameter as an int. Since an int spans over more values than a byte, the 24 high-order bits of the given int are ignored, making it a byte-compatible value: an int in Java is always 32 bits. By removing the 24 high-order bits, you're down to a 8 bits value, i.e. a byte.
I suggest you read carefully the Javadocs for each of those method. As reference, they answer all of your questions:
read:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
write:
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.

Just read the docs.
here is the read method docs
http://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html#read()
public int read()
throws IOException
Reads a byte of data from this input stream. This method blocks if no input is yet available.
Specified by:
read in class InputStream
Returns:
the next byte of data, or -1 if the end of the file is reached.
That int is a your next set of bytes data.
Now , here are the answers.
1) When you assign a char to an int, it denotes it's ascii number to the int.
If you are interested, here us the list of chars and their ascii codes https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html
2)-1 if the end of the file is reached. So that's a check to data exists or not.
3)When you send an ascii code to print writer, it's prints that corresponding char to the file.

writeDouble() method of DataOutputStream is writing data in text document in encoded form

I have the following code
public static void main(String aed[]){
double d=17.3;
try{
DataOutputStream out=null;
out=new DataOutputStream(new BufferedOutputStream(new FileOutputStream("new.txt")));
out.writeDouble(d);
out.flush();
}catch(FileNotFoundException fnf){
fnf.printStackTrace();
}catch(IOException io){
io.printStackTrace();
}
}
Now I am writing this double value to a text file new.txt , but following value is getting in text file
#1LÌÌÌÌÍ
But when i use
out.writeUTF(""+d)
It works fine.
Please explain the encoding that is going on here.

In java there are generally two classes of variables namely reference and primitive types.
Your primitive types include int,double,byte,char,boolean,long,short and float. These store one value and are represented in memory by a unicode 16 bit integer.
Reference types hold storage locations and referneces to certain objects. ( string/UTF is a refernce type) hence the actual value is seen
A binary file is not meant to be read by you but by a program that will fetch the values in the correct form and order and the methods you are using should be used solely for writing to a binary file(.dat) which holds actual data values in their respective forms (int,double etc). When writing to a textfile (.txt) text should be written only hence strings.
Writing to a Textfile :
try{
PrintWriter write=new PrintWriter("your filepath",true);
write.println("whatever needs to be written");
write.close();
}
catch(FileNotFoundException){
}
Reading :
Scanner read;
try{
read=new Scanner(new FileReader("your path"));
while(read.hasNext()){
System.out.println(read.nextLine);
}
read.close();
}
catch(FileNotFoundException e){
}

With DataOutputStream you are writing bytes, the bytes that represent a double value (which is a number value) and not the readable version of that number.
Example:
int i = 8;
In binary i value is '0100' and that's the value that the computer manages.... But you don't want to write the bits '0100' because you want something to read, not it's value; you want the CHARACTER '8', so you must transform the double to character (to String is also valid because is readable)....
And that's what you are doing with ("" + d): transforming it to String.
Use Writer to write text files (BufferedWriter and FileWriter are available, check this for more details)

writeDouble(Double)
method does not use UTF-8 encoding. If you have written a double using writeDoble() then you should read it using readDouble method of DataInputStream. These files are not meant to be modified or read manually. If you want to put it in plain then stick to writeUTF method.
From Documentation -
writeDouble -
Converts the double argument to a long using the doubleToLongBits method in class Double, and then writes that long value to the underlying output stream as an 8-byte quantity, high byte first.

writeDouble (as another writeByte, writeShort, etc. with corresponding size of bytes) writes 8 bytes of double value representation. That's why class called as DataOutputStream (Data).
writeUTF writes 2 bytes of length and actual string.

The java.io.DataOuputStream.writeUTF(String str) Writes two bytes of length information to the output stream, followed by the modified UTF-8 representation of every character in the string s.
writeDouble(double v)
Converts the double argument to a long using the doubleToLongBits
method in class Double, and then writes that long value to the
underlying output stream as an 8-byte quantity, high byte first.
Read the Javadoc:
https://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html

Reading a character at random place from file in java?

When reading from a file using readChar() in RandomAccessFile class, unexpected output comes.
Instead of the desired character ? is displayed.
package tesr;
import java.io.RandomAccessFile;
import java.io.IOException;
public class Test {
public static void main(String[] args) {
try{
RandomAccessFile f=new RandomAccessFile("c:\\ankit\\1.txt","rw");
f.seek(0);
System.out.println(f.readChar());
}
catch(IOException e){
System.out.println("dkndknf");
}
// TODO Auto-generated method stub
}
}

You probably intended readByte. Java char is UTF-16BE, a 2 bytes Unicode representation, and on random binary data very often not representable, no correct UTF-16BE or a half "surrogate" - part of a combination of two char forming one Unicode code point. Java represents a failed conversion in your case as question mark.
If you know in what encoding the file is in, then for a single byte encoding it is simple:
byte b = in.readByte();
byte[] bs = new byte[] { b };
String s = new String(bs, "Cp1252"); // Some single byte encoding
For the variable multi-byte UTF-8 it is also simple to identify a sequence of bytes:
single byte when high bit = 0
otherwise a continuation byte when high bits 10
otherwise a starting byte (with some special cases) telling the number of bytes by its high bits.
For UTF-16LE and UTF-16BE the file positions must be a multiple of 2 and 2 bytes long.
byte[] bs = new byte[2];
in.read(bs);
String s = new String(bs, StandardCharsets.UTF_16LE);

You almost certainly have a character encoding problem. It is not possible to simply read characters from a file. What must be done is that an appropriate sequence of bytes are read, then those bytes are interpreted according to a character encoding scheme to translate them to a character. When you want to read a file as text, Java must be told, perhaps implicitly, which character encoding to use.
If you tell Java the wrong encoding you will get gibberish. If you pick an arbitrary point in a file and start reading, and that location is not the start of the encoding of a character, you will get gibberish. One or both of those has happened in your case.

Curious about some Java out.Write output

When I run the following code:
int i = 0
try {
fstream = new FileWriter(filename);
BufferedWriter out = new BufferedWriter(fstream);
while (i < 100) {
out.write("My Name is Bobby Bob");
out.write(i);
out.newLine();
i++;
}
out.flush();
out.close();
} catch (IOException e) {
e.getClass();
}
I get the following in my output file:
My Name is Bobby Bob
x100
each one is followed by a weird symbol. Male sign, female sign etc etc.
My question is and its more of a curious one. What causes these weird symbols to appear? I was expecting numbers as it counted up. Where are these symbols pulled from?

out is a RandomAccessFile ?
I think you are using write(byte) instead of write(String), so you are writting the byte X, see ASCII TABLES for representations.
Try
write(""+i);
Looking at BufferedWriter java api:
http://download.oracle.com/javase/1.4.2/docs/api/java/io/BufferedWriter.html#write(int)
it says that it writes the integer representation of a char, for your understanding.
If you want to print the value 0 you have to write 48 as this image represents:

When you write
out.write(i);
it writes the (char) i not the number in i as text.
If you want to write i as a number use print
out.print(i);
or
out.write(String.valueOf(i));
or
out.write(""+i);

It looks like you are writing single characters that are specified by the given int value, and not a character representation of the variable i.

System.out is a PrintStream. The PrintStream.write(int) method writes a single byte to the output stream with the byte value specified. So you're not writing the integers 0 through 100, you're writing the bytes 0 through 100. You probably want print(int) instead of write(int).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.