I'm trying to read a file with normal Text data using a byte stream. And I understand that in a byte stream each byte will be read one by one. Hence if I read the data Hi How are you!!!!!! in a the Text file through a byte stream then it should give me the Unicode equivalent of the each character but instead it gives me a different output which doesn't maps to utf or ascii equivalent.
Below is my Program
package files;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
public class FileBaiscs {
public static void main(String[] args) throws IOException {
String strPath = "D:\\Files\\New.txt";
File objSrcFile = new File (strPath);
if (objSrcFile.exists()==false)
{
System.out.println("The Source File is not Present");
System.exit(0);
}
FileInputStream objIpStream = new FileInputStream(objSrcFile);
while ((objIpStream.read())!=-1)
{
System.out.println(objIpStream.read());
}
objIpStream.close();
}
}
The output in my console is:
105
72
119
97
101
121
117
33
33
33
The data in the New text files is - Hi How are you!!!!!!
I expect the output to be the integers which is utf equivalent to above each characters. Kindly let me know if my understanding is wrong.
Here
while ((objIpStream.read())!=-1)
{
System.out.println(objIpStream.read());
}
You are reading 2 bytes insteed of one. First one is read in the condition and second in body of the loop.
WHat you should do is
byte b;
while ((b=objIpStream.read())!=-1)
{
System.out.println(b);
}
Your misunderstanding comes from the fact that you think bytes are characters; they are not.
In order to read characters, you have to convert your bytes to characters first, and this is done using a process known as character encoding. An InputStream won't perform this operation, but a Reader will.
Therefore, try and:
final Path path = Paths.get("F:\\Files\\New.txt");
try (
final BufferedReader reader = Files.newBufferedReader(path,
StandardCharsets.UTF_8);
) {
int c;
while ((c = reader.read()) != -1)
System.out.println(c);
}
Also, in your original code, you read two bytes per loop.
Related
I have a file named mark_method.json containing ABCDE in it and I am reading this file using the InputStream class.
By definition, the InputStream class reads an input stream of bytes. How does this work? I don't have bytes in the file, but characters?
I am trying to understand how a stream reading bytes is reading characters from the file?
public class MarkDemo {
public static void main(String args[]) throws Exception {
InputStream is = null;
try {
is = new FileInputStream("C:\\Users\\s\\Documents\\EB\\EB_02_09_2020_with_page_number_and_quote_number\\Old_images\\mark_method.json");
}
catch(Exception e) {
e.printStackTrace();
} finally {
if(is != null) {
is.close();
}
}
}
}
Every data on the computer is stored in bits and bytes. Here the content of the files is also stored in bytes.
We have programs which convert these bytes into human-readable forms thus we see the mark_method.json file containing characters and not bytes.
An character is a byte. (At least in ASCII).
Each byte from 0 to 127 has a character value. For example 0 is the Null-character, 0xa is \n, 0xd is \r, 0x41 is 'A' and so on.
The implementation only knows bytes. It doesn't know, that the char 0x2709 is ✉. It only sees it as two bytes: 0x27 and 0x09.
Only the texteditor interprets the bytes and show the matching symbol/letter
I think what you are actually asking here is how to convert the bytes you read from file using FileInputStream in to a Java String object you can print and manipulate.
FileInputStream does not have any read methods for directly producing a String object so if that is what you want, you need to further manipulate the input you get.
Option one is to use the Scanner class:
Scanner scanner = new Scanner(is);
String word = scanner.next();
Another option is to read the bytes and use the constructor of the String class that works with byte array:
byte [] bytes = new byte[10];
is.read(bytes);
String text = new String(bytes);
Note that for simplicity I just assumed you can read 10 valid bytes from your file.
In real code you would need some logic to make sure you are reading correct number of bytes.
Also, if your file is not stored using your system default character set, you will need to specify the character set as a parameter to the String constructor.
Finally, you can use another wrapper class, BufferedReader that has a readLine function which takes care of all the logic needed to read bytes representing a line of text from a file and return them in a String.
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
String line = in.readLine();
I'm teaching myself Java IO currently and I'm able to read basic ASCII characters from a .txt file but when I get to other Latin-1 or characters within the 255 range it prints it as 194 instead of the correct character decimal number.
For example, I can read abcdefg from the txt file but if I throw in a character like © I dont get 169, I for some reason get 194. I tried testing this out by just printing all chars between 1-255 with a loop but that works. Reading this input seems to not though... so I'm a little perplexed. I understand I can use a reader object or whatever but I want to cover the basics first by learning the byte streams. Here is what I have though:
InputStream io = null;
try{
io = new FileInputStream("thing.txt");
int yeet = io.read();
System.out.println(yeet);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
UTF-8 encoding table and Unicode characters
You can see here that HEX code for © is c2 a9 i.e. 194 169. It seems that your file has Big Endian Endian Endianness and you read the first byte which is 194.
P.S. Read a file character by character/UTF8 this is another good example of java encodings, code-points, etc.
I have some solutions for you.
The first solution
There is a full understanding of the book on this site
The second solution
I have a sample code for you
public class Example {
public static void main(String[] args) throws Exception {
String str = "hey\u6366";
byte[] charset = str.getBytes("UTF-8");
String result = new String(charset, "UTF-8");
System.out.println(result);
}
}
Output:
hey捦
Let us understand the above program. Firstly we converted a given Unicode string to UTF-8 for future verification using the getBytes() method
String str = "hey\u6366";
byte[] charset = str.getBytes("UTF-8")
Then we converted the charset byte array to Unicode by creating a new String object as follows
String result = new String(charset, "UTF-8");
System.out.println(result);
Good luck
This is my first question on StackOverflow. Hope it's gonna be clear and detailed enough.
So I need to write 2 methods, encrypt and decrypt.
My encrypt function is:
public void cifra() throws FileNotFoundException,IOException {
FileInputStream in=new FileInputStream(file);
String s="";
int b;
while(in.read()!=-1) {
b=in.read()+key;
s+=b;
}
in.close();
PrintStream ps=new PrintStream(file);
ps.println(s);
ps.close();
}
My decrypt function is the same but with
b=in.read()-key;
But it dont works. The output file is not same as the initial file non-crypted.
Thanks for the help!
Change your while function to this:
while ((b = in.read()) != -1) {
b += key;
s += b;
}
Currently you read twice, first time inside while condition and second inside the loop, so you are skipping 1 character.
in.read() is reading in a single byte of the file, as an integer. You are then converting that integer to a string via s+=b.
So say in.read() gives you 97 (ASCII for 'a') and your key is 5, you are turning around and writing literally 102 to the file, instead of an 'f', which would be the "encoded" character.
Your loop should be building a byte array (or byte stream) and you should write that byte array to the file.
Here are the docs for the ByteArrayOutputStream, which your loop should write to, which you can in-turn write to a file.
You are reading bytes (each one into an int).
A String however is not an array of bytes, but contains Unicode text, and can combine Greek, Chinese and whatever. (In fact String uses chars where every char is two bytes.) There is a conversion involved for the external bytes having some charset encoding. That will go wrong, uses more memory and is slow. Hence generally one does not use String here.
FileInputStream in = new FileInputStream(file);
ByteArrayOutputStream out = new ByteArrayOutputStream();
int b;
while((b = in.read()) !=-1) {
b = (b + key) % 256;
out.write(b);
}
in.close();
byte[] data = out.toByteArray();
FileOutputStream out2 = new FileOutputStream(file);
out2.write(data);
out2.close();
The other problem is that bytes have a range 0 - 255 (or signed bytes -128 - 127).
Hence my %, modulo. one sees & 0xFF too (bitwise AND with 255, 0b1111_1111).
Note that println(someInt) will write a textual representation as an integer, 'A' being int 65 will be stored as "65" - to 2 bytes: 56 and 55.
I am a beginner at Java, trying to figure out how to convert characters from a text file into integers. In the process, I wrote a program which generates a text file showing what characters are generated by what integers.
package numberchars;
import java.io.FileWriter;
import java.io.IOException;
import java.io.FileReader;
import java.lang.Character;
public class Numberchars {
public static void main(String[] args) throws IOException {
FileWriter outputStream = new FileWriter("NumberChars.txt");
//Write to the output file the char corresponding to the decimal
// from 1 to 255
int counter = 1;
while (counter <256)
{
outputStream.write(counter);
outputStream.flush();
counter++;
}
outputStream.close();
This generated NumberChars.txt, which had all the numbers, all the letters both upper and lower case, surrounded at each end by other symbols and glyphs.
Then I tried to read this file and convert its characters back into integers:
FileReader inputStream = new FileReader("NumberChars.txt");
FileWriter outputStream2 = new FileWriter ("CharNumbers.txt");
int c;
while ((c = inputStream.read()) != -1)
{
outputStream2.write(Character.getNumericValue(c));
outputStream2.flush();
}
}
}
The resulting file, CharNumbers.txt, began with the same glyphs as NumberChars.txt but then was blank. Opening the files in MS Word, I found NumberChars had 248 characters (including 5 spaces) and CharNumbers had 173 (including 8 spaces).
So why didn't the Character.getNumericValue(c) result in an integer written to CharNumbers.txt? And given that it didn't, why at least didn't it write an exact copy of NumberChars.txt? Any help much appreciated.
Character.getNumericValue doesn't do what you think it does. If you read the Javadoc:
Returns the int value that the specified character (Unicode code point) represents. For example, the character '\u216C' (the Roman numeral fifty) will return an int with a value of 50.
On error it returns -1 (which looks like 0xFF_FF_FF_FF in 2s complement).
Most characters don't have such a "numeric value," so you write the ints out, each padded to 2 bytes (more on that later), read them back in the same way, and then start writing a whole lot of 0xFFFF (-1 truncated to 2 bytes) courtesy of a misplaced Character.getNumericValue. I'm not sure what MS Word is doing, but it's probably getting confused what the encoding of your file is and glomming all those bytes into 0xFF_FF_FF_FF (because the high bits of each byte are set) and treating that as one character. (Use a text editor more suited to this kind of stuff like Notepad++, btw.) If you were to measure your file's size on disk in bytes it will probably still be 256 chars * 2 bytes/chars = 512 bytes.
I'm not sure what you meant to do here, so I'll note that InputStreamReader and OutputStreamWriter work on a (Unicode) character basis, with an encoder that defaults to the system one. That's why your ints are padded/truncated to 2 bytes. If you wanted pure byte IO, use FileInputStream/FileOutputStream. If you wanted to read and write the ints as Strings, you need to use FileWriter/FileReader, but not like you did.
// Just bytes
// This is a try-with-resources. It executes the code with the decls in it
// but is also like an implicit finally block that calls `close()` on each resource.
try(FileOutputStream fos = new FileOutputStream("bytes.bin")) {
for(int b = 0; b < 256; b++) { // Bytes are signed so we use int.
// This takes an int and truncates it for the lowest byte
fos.write(b);
// Can also fill a byte[] and dump it all at once with overloaded write.
}
}
byte[] bytes = new bytes[256];
try(FileInputStream fis = new FileInputStream("bytes.bin")) {
// Reads up to bytes.length bytes into bytes
fis.read(bytes);
}
// Foreach loop. If you don't know what this does, I think you can figure out from the name.
for(byte b : bytes) {
System.out.println(b);
}
// As Strings
try(FileWriter fw = new FileWriter("strings.txt")) {
for(int i = 0; i < 256; i++) {
// You need a delimiter lest you not be able to tell 12 from 1,2 when you read
// Uses system default encoding
fw.write(Integer.toString(i) + "\n");
}
}
byte[] bytes = new byte[256];
try(
FileReader fr = new FileReader("strings.txt");
// FileReaders can't do stuff like "read one line to String" so we wrap it
BufferedReader br = new BufferedReader(fr);
) {
for(int i = 0; i < 256; i++) {
bytes[i] = Byte.valueOf(br.readLine());
}
}
for(byte b : bytes) {
System.out.println(b);
}
public class MyCLAss {
public static void main(String[] args)
{
char x='b';
System.out.println(+x);//just by witting a plus symbol before the variable you can find it's ascii value....it will give 98.
}
}
I'm reading Eckel's book, IO chapter, and there is the following code (p. 667).
public static void main(String[] args) throws IOException {
try {
DataInputStream in = new DataInputStream(new ByteArrayInputStream(BufferedInputFile.read("src/io/FormattedMemoryInput.java").getBytes()));
while(true) {
System.out.print((char)in.readByte()); // problem line
}
} catch (EOFException ex) {
System.err.println("End of stream");
}
}
This code works great, but if i change (char) in.readByte() to in.readChar() it prints me some asian symbols 灡捫慧攠楯㬊੩浰潲琠橡癡漮⨻੩浰. Why is that and why it doesn't print english ASCII symbols out?
Why is that and why it doesn't print english ASCII symbols out?
From DataInput.readChar():
Reads two input bytes and returns a char value. Let a be the first byte read and b be the second byte. The value returned is:
(char)((a << 8) | (b & 0xff))
This method is suitable for reading bytes written by the writeChar method of interface DataOutput.
In other words, it's treating your file as if it's UTF-16-encoded - and it almost certainly isn't.
When you want to read text data you should use a Reader subclass, e.g. InputStreamReader wrapped around FileInputStream, specifying the appropriate encoding for the input data.