FileInputStream read method keeps returning 194 - java

I'm teaching myself Java IO currently and I'm able to read basic ASCII characters from a .txt file but when I get to other Latin-1 or characters within the 255 range it prints it as 194 instead of the correct character decimal number.
For example, I can read abcdefg from the txt file but if I throw in a character like © I dont get 169, I for some reason get 194. I tried testing this out by just printing all chars between 1-255 with a loop but that works. Reading this input seems to not though... so I'm a little perplexed. I understand I can use a reader object or whatever but I want to cover the basics first by learning the byte streams. Here is what I have though:
InputStream io = null;
try{
io = new FileInputStream("thing.txt");
int yeet = io.read();
System.out.println(yeet);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

UTF-8 encoding table and Unicode characters
You can see here that HEX code for © is c2 a9 i.e. 194 169. It seems that your file has Big Endian Endian Endianness and you read the first byte which is 194.
P.S. Read a file character by character/UTF8 this is another good example of java encodings, code-points, etc.

I have some solutions for you.
The first solution
There is a full understanding of the book on this site
The second solution
I have a sample code for you
public class Example {
public static void main(String[] args) throws Exception {
String str = "hey\u6366";
byte[] charset = str.getBytes("UTF-8");
String result = new String(charset, "UTF-8");
System.out.println(result);
}
}
Output:
hey捦
Let us understand the above program. Firstly we converted a given Unicode string to UTF-8 for future verification using the getBytes() method
String str = "hey\u6366";
byte[] charset = str.getBytes("UTF-8")
Then we converted the charset byte array to Unicode by creating a new String object as follows
String result = new String(charset, "UTF-8");
System.out.println(result);
Good luck

Related

Problem in reading text from the file using FileInputStream in Java

I have a file input.txt in my system and I want to read data from that file using FileInputStream in Java. There is no error in the code, but still it does not work. It does not display the output. Here is the code, any one help me out kindly.
package com.company;
import java.io.FileInputStream;
import java.io.InputStream;
public class Main {
public static void main(String[] args) {
// write your code here
byte[] array = new byte[100];
try {
InputStream input = new FileInputStream("input.txt");
System.out.println("Available bytes in the file: " + input.available());
// Read byte from the input stream
input.read(array);
System.out.println("Data read from the file: ");
// Convert byte array into string
String data = new String(array);
System.out.println(data);
// Close the input stream
input.close();
} catch (Exception e) {
e.getStackTrace();
}
}
}
Use utility class Files.
Path path = Paths.get("input.txt");
try {
String data = Files.readString(path, Charset.defaultCharset());
System.out.println(data);
} catch (Exception e) {
e.printStackTrace();
}
For binary data, non-text, one should use Files.readAllBytes.
available() is not the file length, just the number of bytes alread buffered by the system; reading more will block while physically reading the disk device.
String.getBytes(Charset) and new String(byte[], Charset) explicitly specify the charset of the actual bytes. String will then keep the text in Unicode, so it may combine all scripts of the world.
Java was designed with text as Unicode, due to the situation then with C and C++. So in a String you can mix Arabic, Greek, Chinese and math symbols. For that binary data (byte[], InputStream, OutputStream) must be given the encoding, Charset, the bytes are in, and then a conversion to Unicode happens for text (String, char, Reader, Writer).
FileInputStream.read(byte[]) requires using the result and just reads one single buffer, must be repeated.

The Java.io.InputStream class is the superclass of all classes representing an input stream of bytes. How is it reading a file with characters?

I have a file named mark_method.json containing ABCDE in it and I am reading this file using the InputStream class.
By definition, the InputStream class reads an input stream of bytes. How does this work? I don't have bytes in the file, but characters?
I am trying to understand how a stream reading bytes is reading characters from the file?
public class MarkDemo {
public static void main(String args[]) throws Exception {
InputStream is = null;
try {
is = new FileInputStream("C:\\Users\\s\\Documents\\EB\\EB_02_09_2020_with_page_number_and_quote_number\\Old_images\\mark_method.json");
}
catch(Exception e) {
e.printStackTrace();
} finally {
if(is != null) {
is.close();
}
}
}
}
Every data on the computer is stored in bits and bytes. Here the content of the files is also stored in bytes.
We have programs which convert these bytes into human-readable forms thus we see the mark_method.json file containing characters and not bytes.
An character is a byte. (At least in ASCII).
Each byte from 0 to 127 has a character value. For example 0 is the Null-character, 0xa is \n, 0xd is \r, 0x41 is 'A' and so on.
The implementation only knows bytes. It doesn't know, that the char 0x2709 is ✉. It only sees it as two bytes: 0x27 and 0x09.
Only the texteditor interprets the bytes and show the matching symbol/letter
I think what you are actually asking here is how to convert the bytes you read from file using FileInputStream in to a Java String object you can print and manipulate.
FileInputStream does not have any read methods for directly producing a String object so if that is what you want, you need to further manipulate the input you get.
Option one is to use the Scanner class:
Scanner scanner = new Scanner(is);
String word = scanner.next();
Another option is to read the bytes and use the constructor of the String class that works with byte array:
byte [] bytes = new byte[10];
is.read(bytes);
String text = new String(bytes);
Note that for simplicity I just assumed you can read 10 valid bytes from your file.
In real code you would need some logic to make sure you are reading correct number of bytes.
Also, if your file is not stored using your system default character set, you will need to specify the character set as a parameter to the String constructor.
Finally, you can use another wrapper class, BufferedReader that has a readLine function which takes care of all the logic needed to read bytes representing a line of text from a file and return them in a String.
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
String line = in.readLine();

Convert string from file to ASCII and binary

Say I open a text file like this:
public static void main(String[] args) throws IOException {
String file_name = "file.txt";
try {
Read file = new ReadFile(file_name);
String[] Lines = file.openFile();
for (int i = 0; i < es.length; i++) {
System.out.println(Lines[i]);
}
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
Now, I want to change the result to binary (for further conversion into AMI coding), and I suppose that firstly I should turn it to ASCII (though I'm also not 100% certain if that's absolutely necessary), but I'm not sure if I should better change it to chars, or perhaps is there an easier way?
Please, mind that I'm just a beginner.
Do you happen to know for sure that the files will be ASCII encoded? Assuming it is, you can just use the getBytes() function of string:
byte[] lineDefault = line.getBytes();
There is a second option for .getBytes() as well if you don't want to use the default encoding. I often am using:
byte[] lineUtf8 = line.getBytes("UTF-8");
which gives byte sequences which are equivalent to ASCII for characters whose hex values are less than 0x80.

Formatted memory input

I'm reading Eckel's book, IO chapter, and there is the following code (p. 667).
public static void main(String[] args) throws IOException {
try {
DataInputStream in = new DataInputStream(new ByteArrayInputStream(BufferedInputFile.read("src/io/FormattedMemoryInput.java").getBytes()));
while(true) {
System.out.print((char)in.readByte()); // problem line
}
} catch (EOFException ex) {
System.err.println("End of stream");
}
}
This code works great, but if i change (char) in.readByte() to in.readChar() it prints me some asian symbols 灡捫慧攠楯㬊੩浰潲琠橡癡⹩漮⨻੩浰. Why is that and why it doesn't print english ASCII symbols out?
Why is that and why it doesn't print english ASCII symbols out?
From DataInput.readChar():
Reads two input bytes and returns a char value. Let a be the first byte read and b be the second byte. The value returned is:
(char)((a << 8) | (b & 0xff))
This method is suitable for reading bytes written by the writeChar method of interface DataOutput.
In other words, it's treating your file as if it's UTF-16-encoded - and it almost certainly isn't.
When you want to read text data you should use a Reader subclass, e.g. InputStreamReader wrapped around FileInputStream, specifying the appropriate encoding for the input data.

How to read UTF-8 characters from the file as bytes?

I am not able to read a UTF-8 characters from the file as bytes.
the UTF-8 characters are displaying as questionmarak(?) while converting to character from the bytes.
Below code snippet shows file reading.
Please tell me how can we read UTF-8 chanracters from a file.
and plz tell me what is the problem with byte array reading process?
public static void getData {
FormFile file = actionForm.getFile("UTF-8");
byte[] mybt;
try
{
byte[] fileContents = file.getFileData();
StringBuffer sb = new StringBuffer();
for(int i=0;i<fileContents.length;i++){
sb.append((char)fileContents[i]);
}
System.out.println(sb.toString());
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
Output ::??Docum??ents (input file content is : "ÞDocumÿents" , it contains some spanish characters. )
This is the problem:
for(int i=0;i<fileContents.length;i++){
sb.append((char)fileContents[i]);
}
You're converting each byte to a char just by casting it. That's effectively using ISO-Latin-1.
To read text from an InputStream, you adapt it via InputStreamReader, specifying the character encoding.
The simplest way of reading the whole of a file into a string would be to use Guava:
String text = Files.toString(file, Charsets.UTF_8);
Or to convert a byte array:
String text = new String(fileContents, "UTF-8");

Categories