I had written some codes to read from a text file char by char and then print it to the screen,but the result had made me feel confused,here it is:
this is the code that i had written
import java.io.*;
import java.nio.charset.StandardCharsets;
public class learnIO
{
public static void main(String[] args) throws IOException{
var in = new InputStreamReader(new FileInputStream("test1.txt"), StandardCharsets.UTF_8);
while(in.read() != -1){
System.out.println((char)in.read());
}
}
}
the content and encoding scheme of the file:
file test1.txt
test1.txt: ASCII text
cat test1.txt
hello, world!
the result is:
e
l
,
w
r
d
some char had missed,Why did this happen?
return type of read method of InputStreamReader is int that takes 4 bytes
and char type is 2 bytes so casting int to char you skip 2 bytes
refer to https://docs.oracle.com/javase/7/docs/api/java/io/InputStreamReader.html
You need to use InputStreamReader inside BufferedReader as from the official oracle documentation it says that
An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset. The charset that it uses may be specified by name or maybe given explicitly, or the platform's default charset may be accepted.
Each invocation of one of an InputStreamReader's read() methods may cause one or more bytes to be read from the underlying byte-input stream.
To enable the efficient conversion of bytes to characters, more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.
For top efficiency, consider wrapping an InputStreamReader within a BufferedReader. For example:
BufferedReader in
= new BufferedReader(new InputStreamReader(System.in));
So the solution to your problem can be solved using the following code
try {
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("hello.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
//Read File Line By Line
char c;
while ((c = (char) br.read()) != (char) -1) {
// Print the content on the console
String character = Character.toString(c);
System.out.println(character);
}
//Close the input stream
in.close();
} catch (Exception e) {//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
Related
I am trying to read a binary file in Java using the bufferedReader. I wrote that binary-file using "UTF-8" encoding. The code for writing into a binary file:
byte[] inMsgBin=null;
try {
inMsgBin = String.valueOf(cypherText).getBytes("UTF-8");
//System.out.println("CIPHER TEXT:FULL:BINARY WRITE: "+inMsgBin);
} catch (UnsupportedEncodingException ex) {
Logger.getLogger(EncDecApp.class.getName()).log(Level.SEVERE, null, ex);
}
try (FileOutputStream out = new FileOutputStream(fileName+ String.valueOf(new SimpleDateFormat("yyyyMMddhhmm").format(new Date()))+ ".encmsg")) {
out.write(inMsgBin);
out.close();
} catch (IOException ex) {
Logger.getLogger(EncDecApp.class.getName()).log(Level.SEVERE, null, ex);
}
System.out.println("cypherText charCount="+cypherText.length());
Here 'cypherText' is a String with some content. Total no of characters written in the file is given as 19. Also after writing, when I open the binary file in Notepad++, it shows some characters. Selecting all the content of the file counts to 19 characters in total.
Now when I read the same file using BufferedReader, using the following lines of code:
try
{
DecMessage obj2= new DecMessage();
StringBuilder cipherMsg=new StringBuilder();
try (BufferedReader in = new BufferedReader(new FileReader(filePath))) {
String tempLine="";
fileSelect=true;
while ((tempLine=in.readLine()) != null) {
cipherMsg.append(tempLine);
}
}
System.out.println("FROM FILE: charCount= "+cipherMsg.length());
Here the total no of characters read (stored in 'charCount') is 17 instead of 19.
How can I read all the characters of the file correctly?
Specify the same charset while reading file.
try (final BufferedReader br = Files.newBufferedReader(new File(filePath).toPath(),
StandardCharsets.UTF_8))
UPDATE
Now i got your problem. Thanks for the file.
Again : Your file still readable to any text reader like Notepad++ ( Since your characters includes extended and control characters you are seeing those non readable characters . but it is still in ASCII.)
Now back to your problem, You have two problem with your code.
While reading file you should specify the Correct Charset. Readers are character readers - Bytes would be convert into characters while reading. If you specify the Charset it would use that else it would use the default system charset. So you should create BufferedReader as follows
try (final BufferedReader br = Files.newBufferedReader(new File(filePath).toPath(),
StandardCharsets.UTF_8))
Second issue, you have characters which includes Control characters. while reading file line by line , by default bufferedReader uses System's default EOL characters and skip those characters. thats why you are getting 17 instead of 19 ( since you have 2 characters are CR). To avoid this issue you should read characters.
int ch;
while ((ch = br.read()) > -1) {
buffer.append((char)ch);
}
Overall the below method would return proper text.
static String readCyberText() {
StringBuilder buffer = new StringBuilder();
try (final BufferedReader br = Files.newBufferedReader(new File("C:\\projects\\test2201404221017.txt").toPath(),
StandardCharsets.UTF_8)){
int ch;
while ((ch = br.read()) > -1) {
buffer.append((char)ch);
}
return buffer.toString();
}
catch (IOException e) {
e.printStackTrace();
return null;
}
}
And you can test by
String s = readCyberText();
System.out.println(s.length());
System.out.println(s);
and output as
19
ia#
m©Ù6ë<«9K()il
Note: the length of String is 19, however when it display it just displayed 17 characters. because the console considered as eof and displayed in different line. but the String contain all 19 characters properly.
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.in(Standard input stream)- gets the input from keyboard in bytes
InputStreamReader: Converts the bytes into Unicode characters/ converts the standard input into reader object to be used with BufferedReader
Finally BufferedReader: Used to read from character input stream(Input stream reader)
String c = br.ReadLine(); -- a method used to read characters from input stream and put them in the string in one go not byte by byte.
Is everything above right ? Please correct if anything wrong !
Nearly there, but this:
String c = br.readLine(); -- a method used to read characters from input stream and put them in the string in one go not byte by byte.
It reads characters from the input reader (BufferedReader doesn't know about streams) and returns a whole line in one go, not character by character. Think of it in layers, and "above" the InputStreamReader layer, the concept of "bytes" doesn't exist any more.
Also, note that you can read blocks of characters with a Reader without reading a line: read(char[], int, int) - the point of readLine() is that it will do the line ending detection for you.
(As noted in comments, it's also readLine, not ReadLine :)
What is the purpose of BufferedReader, explanation?
Bufferedreader is a java class, the following is the hierarchy of this class.
java.lang.Object ==> java.io.Reader ==> java.io.BufferedReader
Also, BufferedReader provides an efficient way to read content. Very Simple..
Let's have a look at the following example to understand.
import java.io.BufferedReader;
import java.io.FileReader;
public class Main {
public static void main(String[] args) {
BufferedReader contentReader = null;
int total = 0; // variable total hold the number that we will add
//Create instance of class BufferedReader
//FileReader is built in class that takes care of the details of reading content from a file
//BufferedReader is something that adds some buffering on top of that to make reading fom a file more efficient.
try{
contentReader = new BufferedReader(new FileReader("c:\\Numbers.txt"));
String line = null;
while((line = contentReader.readLine()) != null)
total += Integer.valueOf(line);
System.out.println("Total: " + total);
}
catch(Exception e)
{
System.out.println(e.getMessage());
}
finally{
try{
if(contentReader != null)
contentReader.close();
}
catch (Exception e)
{
System.out.println(e.getMessage());
}
}
}
}
In my java application, I have to read one file. The problem what I am facing, after reading the file, the results is coming as non readable format. that means some ascii characters are displayed. That means none of the letters are readable. How can I make it display that?
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("c:\\hello.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println(strLine);
}
// Close the input stream
in.close();
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
Perhaps you have an encoding error. The constructor you are using for an InputStreamReader uses the default character encoding; if your file contains UTF-8 text outside the ASCII range, you will get garbage. Also, you don't need a DataInputStream, since you aren't reading any data objects from the stream. Try this code:
FileInputStream fstream = null;
try {
fstream = new FileInputStream("c:\\hello.txt");
// Decode data using UTF-8
BufferedReader br = new BufferedReader(new InputStreamReader(in, "UTF-8"));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println(strLine);
}
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
} finally {
if (fstream != null) {
try { fstream.close(); }
catch (IOException e) {
// log failure to close file
}
}
}
The output you are getting is an ascii value ,so you need to type cast it into char or string before printing it.Hope this helps
You have to implement this way to handle:-
BufferedReader br = new BufferedReader(new InputStreamReader(in, encodingformat));
.
encodingformat - change it according to which type of encoding issue you are encounter.
Examples: UTF-8, UTF-16, ... soon
Refer this Supported Encodings by Java SE 6 for more info.
My problem got solved. I dont know how. I copied the hello.txt contents to another file and run the java program. I could read all letters. dont know whats the problem in that.
Since you doesn't know the encoding the file is in, use jchardet to detect the encoding used by the file and then use that encoding to read the file as others have already suggested. This is not 100 % fool proof but works for your scenario.
Also, use of DataInputStream is unnecessary.
I am new to java text parsing and I'm wondering what is the best way to parse a file when the format of each line is known.
I have a file that has the following format for each line:
Int;String,double;String,double;String,double;String,double;String,double
Note how the String,double act as a pair separated by a comma and each pair is separated by a semicolon.
A few examples:
1;art,0.1;computer,0.5;programming,0.6;java,0.7;unix,0.3
2;291,0.8;database,0.6;computer,0.2;java,0.9;undegraduate,0.7
3;coffee,0.5;colombia,0.2;java,0.1;export,0.4;import,0.5
I'm using the following code to read each line:
public static void main(String args[]) {
try {
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println(strLine);
}
// Close the input stream
in.close();
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
Thanks in advance :)
You could use the Scanner class, for starters:
A simple text scanner which can parse primitive types and strings using regular expressions.
If you are truly trying to do "C" style parsing, where is the buffer which contains the characters being accumulated for the "next" field? Where is the check that sees if the field separator was read, and where is the code that flushes the current field into the correct data structure once the end of line / field separator is read?
A character by character read loop in Java looks like
int readChar = 0;
while ((readChar = in.read()) != -1) {
// do something with the new readChar.
}
You can provide a pattern and use the Scanner
String input = "fish1-1 fish2-2";
java.util.Scanner s = new java.util.Scanner(input);
s.findInLine("(\\d+)");
java.util.regex.MatchResult result = s.match();
for (int i=1; i<=result.groupCount(); i++)
System.out.println(result.group(i));
s.close();
In Java, I am trying to parse an HTML file that contains complex text such as greek symbols.
I encounter a known problem when text contains a left facing quotation mark. Text such as
mutations to particular “hotspot” regions
becomes
mutations to particular “hotspot�? regions
I have isolated the problem by writting a simple text copy meathod:
public static int CopyFile()
{
try
{
StringBuffer sb = null;
String NullSpace = System.getProperty("line.separator");
Writer output = new BufferedWriter(new FileWriter(outputFile));
String line;
BufferedReader input = new BufferedReader(new FileReader(myFile));
while((line = input.readLine())!=null)
{
sb = new StringBuffer();
//Parsing would happen
sb.append(line);
output.write(sb.toString()+NullSpace);
}
return 0;
}
catch (Exception e)
{
return 1;
}
}
Can anybody offer some advice as how to correct this problem?
★My solution
InputStream in = new FileInputStream(myFile);
Reader reader = new InputStreamReader(in,"utf-8");
Reader buffer = new BufferedReader(reader);
Writer output = new BufferedWriter(new FileWriter(outputFile));
int r;
while ((r = reader.read()) != -1)
{
if (r<126)
{
output.write(r);
}
else
{
output.write("&#"+Integer.toString(r)+";");
}
}
output.flush();
The file read is not in the same encoding (probably UTF-8) as the file written (probably ISO-8859-1).
Try the following to generate a file with UTF-8 encoding:
BufferedWriter output = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile),"UTF8"));
Unfortunately, determining the encoding of a file is very difficult. See Java : How to determine the correct charset encoding of a stream
In addition to what Thierry-Dimitri Roy wrote, if you know the encoding you have to create your FileReader with a bit of extra work. From the docs:
Convenience class for reading
character files. The constructors of
this class assume that the default
character encoding and the default
byte-buffer size are appropriate. To
specify these values yourself,
construct an InputStreamReader on a
FileInputStream.
The Javadoc for FileReader says:
The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.
In your case the default character encoding is probably not appropriate. Find what encoding the input file uses, and specify it. For example:
FileInputStream fis = new FileInputStream(myFile);
InputStreamReader isr = new InputStreamReader(fis, "charset name goes here");
BufferedReader input = new BufferedReader(isr);