First character of the reading from the text file :  [duplicate] - java

This question already has answers here:
Java read file got a leading BOM [  ]
(7 answers)
Closed 9 years ago.
If I write this code, I get this as output --> This first: 
and then the other lines
try {
BufferedReader br = new BufferedReader(new FileReader(
"myFile.txt"));
String line;
while (line = br.readLine() != null) {
System.out.println(line);
}
br.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
How can I avoid it?

You are getting the characters  on the first line because this sequence is the UTF-8 byte order mark (BOM). If a text file begins with a BOM, it's likely it was generated by a Windows program like Notepad.
To solve your problem, we choose to read the file explicitly as UTF-8, instead of whatever default system character encoding (US-ASCII, etc.):
BufferedReader in = new BufferedReader(
new InputStreamReader(
new FileInputStream("myFile.txt"),
"UTF-8"));
Then in UTF-8, the byte sequence  decodes to one character, which is U+FEFF. This character is optional - a legal UTF-8 file may or may not begin with it. So we will skip the first character only if it's U+FEFF:
in.mark(1);
if (in.read() != 0xFEFF)
in.reset();
And now you can continue with the rest of your code.

The problem could be in encoding used.
try this:
BufferedReader in = new BufferedReader(new InputStreamReader(
new FileInputStream("yourfile"), "UTF-8"));

Related

discrepancy in java input output [duplicate]

This question already has answers here:
How can I read a large text file line by line using Java?
(22 answers)
How to write to Standard Output using BufferedWriter
(2 answers)
Closed 5 years ago.
String text;
try {
PrintStream pw2 = new PrintStream(new FileOutputStream("C:\\Users\\jadit\\Desktop\\ts.doc"));
InputStreamReader isr = new InputStreamReader(System.in);
BufferedReader br = new BufferedReader(isr);
text = br.readLine(); //Reading String
System.out.println(text);
pw2.print(text);
pw2.close();
isr.close();
br.close();
}
catch(Exception e) {
System.out.println(e);
}
int str;
try {
FileInputStream fr2 = new FileInputStream("C:\\Users\\jadit\\Desktop\\ts.doc");
BufferedInputStream br2 = new BufferedInputStream(fr2);
PrintStream pw1 = new PrintStream(System.out, true);
while ((str=br2.read()) >= 0)
pw1.println(" "+str);
fr2.close();
pw1.close();
br2.close();
}
catch(Exception e){}
output:
run:
a b c d
a b c d
97
32
98
32
99
32
100
32
If I am trying to read the contents of some other file say, t.txt in the second try block then it is not executing or reading the contents of file t.txt, but when I am reading the contents of the same file that is being written in first try block it is displaying the contents as shown above in the output. So even though the streams are being closed in first try block itself and are being opened in the next try block, why is this happening? Can't we work differently on different files in the same program ?
If my understanding of your requirement is right, you are
Trying to read a content from Standard Input and write it to file.
Trying to read a content from a file and write it to standard output.
Reading a content from standard input and writing it to a file works but you are having trouble reading content from a file and writing it to a standard output.
The following code will help you achieve the second part.
try
{
FileReader fr = new FileReader("C:\\Users\\jadit\\Desktop\\ts.doc");
BufferedReader br = new BufferedReader(fr);
String str = null;
while ((str = br.readLine()) != null)
{
System.out.println(str);
}
fr.close();
br.close();
}
catch(Exception e)
{
e.printStackTrace();
}
Well, your second try catch block was printing ascii values of the text in file because you are printing 'str' without converting it to character
What you have to do is replace pw1.println(" "+str); with this:
char c = (char)str;
pw1.println(" "+c);
and it shall give you content of file instead of their ascii values.

java reading a file through scanner and its appending character ? for first line [duplicate]

This question already has answers here:
Java read file got a leading BOM [  ]
(7 answers)
Closed 9 years ago.
If I write this code, I get this as output --> This first: 
and then the other lines
try {
BufferedReader br = new BufferedReader(new FileReader(
"myFile.txt"));
String line;
while (line = br.readLine() != null) {
System.out.println(line);
}
br.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
How can I avoid it?
You are getting the characters  on the first line because this sequence is the UTF-8 byte order mark (BOM). If a text file begins with a BOM, it's likely it was generated by a Windows program like Notepad.
To solve your problem, we choose to read the file explicitly as UTF-8, instead of whatever default system character encoding (US-ASCII, etc.):
BufferedReader in = new BufferedReader(
new InputStreamReader(
new FileInputStream("myFile.txt"),
"UTF-8"));
Then in UTF-8, the byte sequence  decodes to one character, which is U+FEFF. This character is optional - a legal UTF-8 file may or may not begin with it. So we will skip the first character only if it's U+FEFF:
in.mark(1);
if (in.read() != 0xFEFF)
in.reset();
And now you can continue with the rest of your code.
The problem could be in encoding used.
try this:
BufferedReader in = new BufferedReader(new InputStreamReader(
new FileInputStream("yourfile"), "UTF-8"));

Java readline() skipping second line

I have a questions file that I'd like to read, and when its reading, I want it to Identify the questions from the answers and print them, before each questions there is a line of "#" characters, code keeps skipping question one for some reason? what am I missing here?
Here is the code:
try {
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream(path);
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
strLine = br.readLine();
System.out.println(strLine);
// Read File Line By Line
while ((strLine ) != null) {
strLine = strLine.trim();
if ((strLine.length()!=0) && (strLine.charAt(0)=='#' && strLine.charAt(1)=='#')) {
strLine = br.readLine();
System.out.println(strLine);
//questions[q] = strLine;
}
strLine = br.readLine();
}
// Close the input stream
fstream.close();
// System.out.println(questions[0]);
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
I suspect, that the file you read is in UTF-8 with BOM.
The BOM is a code before the first character, that helps to identify the proper encoding of textfiles.
The issue with BOM is, that it is invisible and disturbs the reading. The textfile with BOM is arguable no longer a textfile. Especially, if you read the first line, the first character is no longer a #, but it is something different, because it is the character BOM+#.
Try to load the file with the explicit encoding specified. Java can handle BOM in newer releases, don't remember which exactly.
BufferedReader br = new BufferedReader(new InputStreamReader(fstream, "UTF-8"));
Otherwise, take a decent text editor, like notepad++ and change the encoding to UTF-8 without BOM or ANSI encoding (yuck).
Notice that when you either enter the if statement in the while or not, you first do strLine = br.readLine(); which overwrite the line you read when you initialized strline.

Binary file not being read properly in Java

I am trying to read a binary file in Java using the bufferedReader. I wrote that binary-file using "UTF-8" encoding. The code for writing into a binary file:
byte[] inMsgBin=null;
try {
inMsgBin = String.valueOf(cypherText).getBytes("UTF-8");
//System.out.println("CIPHER TEXT:FULL:BINARY WRITE: "+inMsgBin);
} catch (UnsupportedEncodingException ex) {
Logger.getLogger(EncDecApp.class.getName()).log(Level.SEVERE, null, ex);
}
try (FileOutputStream out = new FileOutputStream(fileName+ String.valueOf(new SimpleDateFormat("yyyyMMddhhmm").format(new Date()))+ ".encmsg")) {
out.write(inMsgBin);
out.close();
} catch (IOException ex) {
Logger.getLogger(EncDecApp.class.getName()).log(Level.SEVERE, null, ex);
}
System.out.println("cypherText charCount="+cypherText.length());
Here 'cypherText' is a String with some content. Total no of characters written in the file is given as 19. Also after writing, when I open the binary file in Notepad++, it shows some characters. Selecting all the content of the file counts to 19 characters in total.
Now when I read the same file using BufferedReader, using the following lines of code:
try
{
DecMessage obj2= new DecMessage();
StringBuilder cipherMsg=new StringBuilder();
try (BufferedReader in = new BufferedReader(new FileReader(filePath))) {
String tempLine="";
fileSelect=true;
while ((tempLine=in.readLine()) != null) {
cipherMsg.append(tempLine);
}
}
System.out.println("FROM FILE: charCount= "+cipherMsg.length());
Here the total no of characters read (stored in 'charCount') is 17 instead of 19.
How can I read all the characters of the file correctly?
Specify the same charset while reading file.
try (final BufferedReader br = Files.newBufferedReader(new File(filePath).toPath(),
StandardCharsets.UTF_8))
UPDATE
Now i got your problem. Thanks for the file.
Again : Your file still readable to any text reader like Notepad++ ( Since your characters includes extended and control characters you are seeing those non readable characters . but it is still in ASCII.)
Now back to your problem, You have two problem with your code.
While reading file you should specify the Correct Charset. Readers are character readers - Bytes would be convert into characters while reading. If you specify the Charset it would use that else it would use the default system charset. So you should create BufferedReader as follows
try (final BufferedReader br = Files.newBufferedReader(new File(filePath).toPath(),
StandardCharsets.UTF_8))
Second issue, you have characters which includes Control characters. while reading file line by line , by default bufferedReader uses System's default EOL characters and skip those characters. thats why you are getting 17 instead of 19 ( since you have 2 characters are CR). To avoid this issue you should read characters.
int ch;
while ((ch = br.read()) > -1) {
buffer.append((char)ch);
}
Overall the below method would return proper text.
static String readCyberText() {
StringBuilder buffer = new StringBuilder();
try (final BufferedReader br = Files.newBufferedReader(new File("C:\\projects\\test2201404221017.txt").toPath(),
StandardCharsets.UTF_8)){
int ch;
while ((ch = br.read()) > -1) {
buffer.append((char)ch);
}
return buffer.toString();
}
catch (IOException e) {
e.printStackTrace();
return null;
}
}
And you can test by
String s = readCyberText();
System.out.println(s.length());
System.out.println(s);
and output as
19
ia#
m©Ù6ë<«9K()il
Note: the length of String is 19, however when it display it just displayed 17 characters. because the console considered as eof and displayed in different line. but the String contain all 19 characters properly.

java file reading issue

In my java application, I have to read one file. The problem what I am facing, after reading the file, the results is coming as non readable format. that means some ascii characters are displayed. That means none of the letters are readable. How can I make it display that?
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("c:\\hello.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println(strLine);
}
// Close the input stream
in.close();
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
Perhaps you have an encoding error. The constructor you are using for an InputStreamReader uses the default character encoding; if your file contains UTF-8 text outside the ASCII range, you will get garbage. Also, you don't need a DataInputStream, since you aren't reading any data objects from the stream. Try this code:
FileInputStream fstream = null;
try {
fstream = new FileInputStream("c:\\hello.txt");
// Decode data using UTF-8
BufferedReader br = new BufferedReader(new InputStreamReader(in, "UTF-8"));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println(strLine);
}
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
} finally {
if (fstream != null) {
try { fstream.close(); }
catch (IOException e) {
// log failure to close file
}
}
}
The output you are getting is an ascii value ,so you need to type cast it into char or string before printing it.Hope this helps
You have to implement this way to handle:-
BufferedReader br = new BufferedReader(new InputStreamReader(in, encodingformat));
.
encodingformat - change it according to which type of encoding issue you are encounter.
Examples: UTF-8, UTF-16, ... soon
Refer this Supported Encodings by Java SE 6 for more info.
My problem got solved. I dont know how. I copied the hello.txt contents to another file and run the java program. I could read all letters. dont know whats the problem in that.
Since you doesn't know the encoding the file is in, use jchardet to detect the encoding used by the file and then use that encoding to read the file as others have already suggested. This is not 100 % fool proof but works for your scenario.
Also, use of DataInputStream is unnecessary.

Categories