I'm reading Eckel's book, IO chapter, and there is the following code (p. 667).
public static void main(String[] args) throws IOException {
try {
DataInputStream in = new DataInputStream(new ByteArrayInputStream(BufferedInputFile.read("src/io/FormattedMemoryInput.java").getBytes()));
while(true) {
System.out.print((char)in.readByte()); // problem line
}
} catch (EOFException ex) {
System.err.println("End of stream");
}
}
This code works great, but if i change (char) in.readByte() to in.readChar() it prints me some asian symbols 灡捫慧攠楯㬊੩浰潲琠橡癡漮⨻੩浰. Why is that and why it doesn't print english ASCII symbols out?
Why is that and why it doesn't print english ASCII symbols out?
From DataInput.readChar():
Reads two input bytes and returns a char value. Let a be the first byte read and b be the second byte. The value returned is:
(char)((a << 8) | (b & 0xff))
This method is suitable for reading bytes written by the writeChar method of interface DataOutput.
In other words, it's treating your file as if it's UTF-16-encoded - and it almost certainly isn't.
When you want to read text data you should use a Reader subclass, e.g. InputStreamReader wrapped around FileInputStream, specifying the appropriate encoding for the input data.
Related
I have a file input.txt in my system and I want to read data from that file using FileInputStream in Java. There is no error in the code, but still it does not work. It does not display the output. Here is the code, any one help me out kindly.
package com.company;
import java.io.FileInputStream;
import java.io.InputStream;
public class Main {
public static void main(String[] args) {
// write your code here
byte[] array = new byte[100];
try {
InputStream input = new FileInputStream("input.txt");
System.out.println("Available bytes in the file: " + input.available());
// Read byte from the input stream
input.read(array);
System.out.println("Data read from the file: ");
// Convert byte array into string
String data = new String(array);
System.out.println(data);
// Close the input stream
input.close();
} catch (Exception e) {
e.getStackTrace();
}
}
}
Use utility class Files.
Path path = Paths.get("input.txt");
try {
String data = Files.readString(path, Charset.defaultCharset());
System.out.println(data);
} catch (Exception e) {
e.printStackTrace();
}
For binary data, non-text, one should use Files.readAllBytes.
available() is not the file length, just the number of bytes alread buffered by the system; reading more will block while physically reading the disk device.
String.getBytes(Charset) and new String(byte[], Charset) explicitly specify the charset of the actual bytes. String will then keep the text in Unicode, so it may combine all scripts of the world.
Java was designed with text as Unicode, due to the situation then with C and C++. So in a String you can mix Arabic, Greek, Chinese and math symbols. For that binary data (byte[], InputStream, OutputStream) must be given the encoding, Charset, the bytes are in, and then a conversion to Unicode happens for text (String, char, Reader, Writer).
FileInputStream.read(byte[]) requires using the result and just reads one single buffer, must be repeated.
I have a file named mark_method.json containing ABCDE in it and I am reading this file using the InputStream class.
By definition, the InputStream class reads an input stream of bytes. How does this work? I don't have bytes in the file, but characters?
I am trying to understand how a stream reading bytes is reading characters from the file?
public class MarkDemo {
public static void main(String args[]) throws Exception {
InputStream is = null;
try {
is = new FileInputStream("C:\\Users\\s\\Documents\\EB\\EB_02_09_2020_with_page_number_and_quote_number\\Old_images\\mark_method.json");
}
catch(Exception e) {
e.printStackTrace();
} finally {
if(is != null) {
is.close();
}
}
}
}
Every data on the computer is stored in bits and bytes. Here the content of the files is also stored in bytes.
We have programs which convert these bytes into human-readable forms thus we see the mark_method.json file containing characters and not bytes.
An character is a byte. (At least in ASCII).
Each byte from 0 to 127 has a character value. For example 0 is the Null-character, 0xa is \n, 0xd is \r, 0x41 is 'A' and so on.
The implementation only knows bytes. It doesn't know, that the char 0x2709 is ✉. It only sees it as two bytes: 0x27 and 0x09.
Only the texteditor interprets the bytes and show the matching symbol/letter
I think what you are actually asking here is how to convert the bytes you read from file using FileInputStream in to a Java String object you can print and manipulate.
FileInputStream does not have any read methods for directly producing a String object so if that is what you want, you need to further manipulate the input you get.
Option one is to use the Scanner class:
Scanner scanner = new Scanner(is);
String word = scanner.next();
Another option is to read the bytes and use the constructor of the String class that works with byte array:
byte [] bytes = new byte[10];
is.read(bytes);
String text = new String(bytes);
Note that for simplicity I just assumed you can read 10 valid bytes from your file.
In real code you would need some logic to make sure you are reading correct number of bytes.
Also, if your file is not stored using your system default character set, you will need to specify the character set as a parameter to the String constructor.
Finally, you can use another wrapper class, BufferedReader that has a readLine function which takes care of all the logic needed to read bytes representing a line of text from a file and return them in a String.
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
String line = in.readLine();
I'm teaching myself Java IO currently and I'm able to read basic ASCII characters from a .txt file but when I get to other Latin-1 or characters within the 255 range it prints it as 194 instead of the correct character decimal number.
For example, I can read abcdefg from the txt file but if I throw in a character like © I dont get 169, I for some reason get 194. I tried testing this out by just printing all chars between 1-255 with a loop but that works. Reading this input seems to not though... so I'm a little perplexed. I understand I can use a reader object or whatever but I want to cover the basics first by learning the byte streams. Here is what I have though:
InputStream io = null;
try{
io = new FileInputStream("thing.txt");
int yeet = io.read();
System.out.println(yeet);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
UTF-8 encoding table and Unicode characters
You can see here that HEX code for © is c2 a9 i.e. 194 169. It seems that your file has Big Endian Endian Endianness and you read the first byte which is 194.
P.S. Read a file character by character/UTF8 this is another good example of java encodings, code-points, etc.
I have some solutions for you.
The first solution
There is a full understanding of the book on this site
The second solution
I have a sample code for you
public class Example {
public static void main(String[] args) throws Exception {
String str = "hey\u6366";
byte[] charset = str.getBytes("UTF-8");
String result = new String(charset, "UTF-8");
System.out.println(result);
}
}
Output:
hey捦
Let us understand the above program. Firstly we converted a given Unicode string to UTF-8 for future verification using the getBytes() method
String str = "hey\u6366";
byte[] charset = str.getBytes("UTF-8")
Then we converted the charset byte array to Unicode by creating a new String object as follows
String result = new String(charset, "UTF-8");
System.out.println(result);
Good luck
This code reads a notepad file
this notepad file has the number 10 on it
it returns a gibberish letter for some reason instead of 10
I think it is the ascii code but i do not know
Also this code is modified from my programming teachers code so I do not take credit for it
/**
*Goes in to the file and extracts a number.
* #param fileName
* #return an integer
*/
static int getNumberFromFile(String fileName){
int j = 599;
try {
File textFile = new File(fileName);
Scanner sc = new Scanner(textFile);
String input = sc.nextLine();
j = Integer.parseInt(input);
} catch (Exception e) {
System.out.println("Exception: " + e);
}
return j;
}
throws this wierd exception Exception:
java.lang.NumberFormatException: For input string: "10" and this code
/**
* writes data for the ai to adapt its strategy
*#param number is the number to write
* #param fileName is the fileName
*/
public static void writeToFile(String fileName,int number) {
BufferedWriter output = null;
try {
File aFile = new File(fileName);
FileWriter myWriter = new FileWriter(aFile);
output = new BufferedWriter(myWriter);
output.write(number);
output.newLine();
output.close();
} catch (Exception e) {
System.out.println("Exception:" + e);
System.out.println("please Report this bug it doesnt understand");
System.exit(1);
}
}
dont worry about some of the exception catch things those were for me to see if the exceptions are caught it just prints a (nonsense) message. and some of the stuff where a talk about an ai dont worry just need this code working I can post why the ai needs it but i dont think it is relevant
This line doesn't do what you expect:
output.write(number);
It's calling write on a BufferedWriter, so you should consult the documentation... at which point you find you're calling this method.
public void write(int c)
throws IOException
Writes a single character.
Overrides:
write in class Writer
Parameters:
c - int specifying a character to be written
And following the write link gives more details:
Writes a single character. The character to be written is contained in the 16 low-order bits of the given integer value; the 16 high-order bits are ignored.
Subclasses that intend to support efficient single-character output should override this method.
So, you're writing the Unicode character U+000A - or would be if the value were really 10. I strongly suspect it's not though, as that would just be a line feed character.
If you're trying to write the decimal representation of the number though, you should turn it into a string first:
output.write(String.valueOf(number));
I'm trying to read a file with normal Text data using a byte stream. And I understand that in a byte stream each byte will be read one by one. Hence if I read the data Hi How are you!!!!!! in a the Text file through a byte stream then it should give me the Unicode equivalent of the each character but instead it gives me a different output which doesn't maps to utf or ascii equivalent.
Below is my Program
package files;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
public class FileBaiscs {
public static void main(String[] args) throws IOException {
String strPath = "D:\\Files\\New.txt";
File objSrcFile = new File (strPath);
if (objSrcFile.exists()==false)
{
System.out.println("The Source File is not Present");
System.exit(0);
}
FileInputStream objIpStream = new FileInputStream(objSrcFile);
while ((objIpStream.read())!=-1)
{
System.out.println(objIpStream.read());
}
objIpStream.close();
}
}
The output in my console is:
105
72
119
97
101
121
117
33
33
33
The data in the New text files is - Hi How are you!!!!!!
I expect the output to be the integers which is utf equivalent to above each characters. Kindly let me know if my understanding is wrong.
Here
while ((objIpStream.read())!=-1)
{
System.out.println(objIpStream.read());
}
You are reading 2 bytes insteed of one. First one is read in the condition and second in body of the loop.
WHat you should do is
byte b;
while ((b=objIpStream.read())!=-1)
{
System.out.println(b);
}
Your misunderstanding comes from the fact that you think bytes are characters; they are not.
In order to read characters, you have to convert your bytes to characters first, and this is done using a process known as character encoding. An InputStream won't perform this operation, but a Reader will.
Therefore, try and:
final Path path = Paths.get("F:\\Files\\New.txt");
try (
final BufferedReader reader = Files.newBufferedReader(path,
StandardCharsets.UTF_8);
) {
int c;
while ((c = reader.read()) != -1)
System.out.println(c);
}
Also, in your original code, you read two bytes per loop.