Read a text file until EOL in Java - java

I am trying to read a text file which has -
hello James!
How are you today!
I want to read the each character in the string till i find EOL character.As i am using windows where i have /n/r which represents EOL character.How can i write a condition to go through all the characters of the string and print them one by one till it reaches EOL(/n/r).
int readedValue;
do
{
while((readedValue = bufferReader.read()) != 10)
{
//readedValue = bufferReader.read();
char ch = (char) readedValue;
System.out.print(ch);
}
}
while ((readedValue = bufferReader.read()) != -1);
when i read the file now , i get out put as hello James!ow are you today!
I am not getting 'H'ow in How. What can i alter this to get the complete text?

As people have noted, the readline() method reads to the next line separator, and returns the line with the separator removed. So your tests for '\n' and '\r' in line cannot possibly evaluate to true.
But you can easily add an extra end-of-line when you output the line string1.
1 - that is, unless you actually need to preserve the exact same end-of-line sequence characters as in the input stream.
You ask:
Instead of using readline(), Is there any way i can use buffer reader to read each character and print them?
Yea, sure. The read() method returns either one character or -1 to indicate EOF. So:
int ch = br.read();
while (ch != -1) {
System.out.print((char) ch);
ch = br.read();
}

You can use something like that:
while((line=input.readLine())!=null) {
// do something
}
If you want to read char by char, you can use this:
int readedValue;
while ((readedValue = reader.read()) != -1) {
char ch = (char) readedValue;
// do something
}
Here is an example (with a string instead a file) for your new problem:
String line;
int readedValue;
String s = "hello James!\n\rHow are you today!";
StringReader input = new StringReader(s);
BufferedReader lineReader= new BufferedReader (input);
while((line=lineReader.readLine())!=null) {
StringReader input2 = new StringReader(line);
BufferedReader charReader= new BufferedReader (input2);
while((readedValue = charReader.read()) != -1) {
char ch = (char) readedValue;
System.out.print(ch);
}
}

Your problem is about magic numbers.
your while will enter into an infinite loop in the case charAt(21)!='\n' && charAt(22)!='\r'
those two integers shall be increased inside the loop.
charAt(i)!='\n' && charAt(i+1)!='\r'
::inside loop
i++

Related

BufferedReader always resets to last character in a file

I am having trouble reseting BufferedReader to a random point in reading a txt file. My example input text in a file is: number = 10;.
Last part of code System.out.println((char)c); will print out ; instead of 1.
What am I doing wrong in this simple example?
My code:
String filePath = "data.txt";
BufferedReader br = new BufferedReader(new FileReader(filePath));
String line = "";
int c = 0;
while ((c = br.read()) != -1) {
if((char)c == '1') {
br.mark(1000);
}
System.out.print((char)c);
}
br.reset();
c = br.read();
System.out.println((char)c);
From using your code and some dummy .txt file I made using the number=10; as a line. c = br.read() returns 0.
The java.io.BufferedReader.mark(int) method marks the current position in the stream. Invoking reset() will reposition the stream to this point, from here.
So essentially it prints the character after the character you have assigned as the mark.
If you wish to print the rest of the line you could use:
String line = br.readLine();
If you do want to get the 1 for some reason, you could simply modify your code as such:
c = br.read()+1;
I tried your code, first of all it is printing '0' in my case.
Actually when you use reset its putting cursor after 1 not before 1.
But according to your need you need output as 1, which can be possible if you use mark on the character before 1.
Like I wanted to print r so I put mark on e as shown below:
while ((c = br.read()) != -1) {
if((char)c == 'e') {
br.mark(1000);
}
System.out.print((char)c);
}
br.reset();
c = br.read();
System.out.println((char)c);
Hope it makes thing clear.

Removing linefeeds using java

I have a text file with a mix of newlines (CR/LF or \r\f or "\n") and formfeeds (LF or \f) in a tab \t delimited file. The newlines appear as the expected, "\n" but the formfeeds are also used as internal field delimiters. Example:
COL_1 COL_2 COL_3 COL_4
1 A\fB C\fD 2
Using Java I was able to remove the formfeeds only after I set line.separator to \r - for CR/LF or \r\f and then reading in the file using the FileReader.read() checking for '\n':
private void fixMe() throws Exception{
FileReader in = new FileReader("C:\\somefile.txt");
FileReader out = new FileReader("C:\\someotherfile.txt");
Syetem.setProperty("line.separator","\r");
try {
int c;
while (( c = in.read()) != -1 ) {
if ( c != '\n' ) {
out.write(c);
}
}
}
...
It appears that in.read has a default setting to read "\n" as two characters. I can remove \f but now I'll have to write another method to change \r to a "\n" and reset line.separator as part of the method. Is there a better way to do this? I want to use Scanner, but the solution points at resetting the line.separator setting again which I want to avoid.
Better way to read all file content, then remove "\n and \r\n and \f", after save where you want.
See example:
String content = new String(Files.readAllBytes(Paths.get("path-to-file")));
String processedContent = content.replaceAll("\\n|\\r\\n|\\f", "");
According to your question it seems like you want to skip Line Feed '\f' in the file without skipping if it is CRLF \r\f, so keeping track of last character read might solve your issue.
private void fixMe() throws Exception{
FileReader in = new FileReader("C:\\somefile.txt");
FileReader out = new FileReader("C:\\someotherfile.txt");
//Character 10 'LF' or '\f' and 13 'CR' or '\r'
try {
int c;
int prevCharRead = 0;
while ((c = in.read()) != -1 ) {
if(c==10 && prevCharRead!=13){
//it's a line feed LF '\f' without the occurrence of CR '\r' before it, skip it or implement whatever logic you want.
}else
out.write(c);
prevCharRead = c;
}
}
...

How to read multiple lines using FileReader only?

I have the following code:
public class Reader {
public static void main(String[] args) throws IOException {
try (FileReader in = new FileReader("D:/test.txt")) {
// BufferedReader br = new BufferedReader(in);
int line = in .read();
for (int i = 0; i < line; i++) {
//System.out.println(line);
System.out.println((char) line);
line = in .read();
}
}
}
}
and a file Test.txt with the content:
Hello
Java
When I run above code it only reads Hello. I would like to read multiple lines using FileReader only. I don't want to use BufferedReader or InputStreamReader etc. Is that possible?
I don't think this version of the code prints "Hello".
You are calling:
int line = in.read();
What does this do? Look in the Javadocs for Reader:
public int read()
throws IOException
Reads a single character. This method will block until a character is available, an I/O error occurs, or the end
of the stream is reached.
(emphasis mine)
Your code reads the 'H' from 'Hello', which is 72 in ASCII.
Then it goes into your loop, with line==72, so it goes into the loop:
for(int i=0;i<line;i++)
... making the decision "is 0 less than 72? Yes, so I'll go into the loop block".
Then each time it reads a character the value of line changes to another integer, and each time loop goes around i increments. So the loop says "Keep going for as long as the ASCII value of the character is greater than the number of iterations I've counted".
... and each time it goes around, it prints that character on a line of its own.
As it happens, for your input, it reads end-of-file (-1), and as -1 < i, the loop continue condition is not met.
But for longer inputs it stop on the first 'a' after the 97th character, or the first 'b' after the 98th character, and so on (because ASCII 'a' is 97, etc.)
H
e
l
l
o
J
a
v
a
This isn't what you want:
You don't want your loop to repeat until i >= "the character I just read". You want it to repeat until in.read() returns -1. You have probably been taught how to loop until a condition is met.
You don't want to println() each character, since that adds newlines you don't want. Use print().
You should also look at the Reader.read(byte[] buffer) method, and see if you can write the code to work in bigger chunks.
Two patterns you'll use over and over again in your programming career are:
Type x = getSomehow();
while(someCondition(x)) {
doSomethingWith(x);
x = getSomehow();
}
... and ...
Type x = value_of_x_which_meets_condition;
while(someCondition(x)) {
x = getSomehow();
doSomethingWith(x);
}
See if you can construct something with FileReader and the value you get from it, filling in the "somehows".
Reading file character by character without any buffering stream is extremely ineffective. I would probably wrap FileReader in some BufferedReader or simply used Scanner to read condent of file, but if you absolutely want/need/have to use only FileReader then you can try with
int line = in.read();
while (line != -1) {
System.out.print((char) line);
line = in.read();
}
instead of your for (int i = 0; i < line; i++) {...} loop.
Read carefully slims answer. In short: reading condition shouldn't care if number of characters you read is less then numeric representation of currently read character (i < line). Like in case of
My name
is
not important now
This file has few characters which you normally will not see like \r and \n and in reality it looks like
My name\r\n
\r\n
is\r\n
\r\n
not important now
where numeric representation of \r is 10, so after you read My name\r\n (which is 9 characters because \r and \n are single character representing line separator) your i will become 10 and since next character you will try to read is \r which is also represented by 10 your condition i<line will fail (10<10 is not true).
So instead of checking i<line you should check if read value is not EoF (End of File, or End of Stream in out case) which is represented by -1 as specified in read method documentation so your condition should look like line != -1. And because you don't need i just use while loop here.
Returns:
The character read, or -1 if the end of the stream has been reached
You will have to read the content char by char and parse for a new line sequence.
A new line sequence can be any of the following:
a single cariage return '\r'
a single line feed '\n'
a carriage return followed by a line feed "\r\n"
EDIT
You could try the following:
public List<String> readLinesUsingFileReader(String filename) throws IOException {
List<String> lines = null;
try (FileReader fileReader = new FileReader(filename)) {
lines = readLines(fileReader);
}
return lines;
}
private List<String> readLines(FileReader fileReader) throws IOException {
List<String> lines = new ArrayList<>();
boolean newLine = false;
int c, p = 0;
StringBuilder line = new StringBuilder();
while(-1 != (c = fileReader.read())) {
if(c == '\n' && p != '\r') {
newLine = true;
} else if(c == '\r') {
newLine = true;
} else {
if(c != '\n' && c != '\r') {
line.append((char) c);
}
}
if(newLine) {
lines.add(line.toString());
line = new StringBuilder();
newLine = false;
}
p = c;
}
if(line.length() > 0) {
lines.add(line.toString());
}
return lines;
}
Note that the code above reads the whole file into a List, this might not be well suited for large files! You may want in such a case to implement an approach which uses streaming, i.e. read one line at a time, for example String readNextLine(FileReader fileReader) { ... }.
Some basic tests:
Create test files to read
private final static String txt0 = "testnl0.txt";
private final static String txt1 = "testnl1.txt";
private final static String txt2 = "testnl2.txt";
#BeforeClass
public static void genTestFile() throws IOException {
try (OutputStream os = new FileOutputStream(txt0)) {
os0.write((
"Hello\n" +
",\r\n" +
"World!" +
"").getBytes());
}
try (OutputStream os = new FileOutputStream(txt1)) {
os.write((
"\n" +
"\r\r" +
"\r\n" +
"").getBytes());
}
try (OutputStream os = new FileOutputStream(txt2)) {
os.write((
"").getBytes());
}
}
Test using the created files
#Test
public void readLinesUsingFileReader0() throws IOException {
List<String> lines = readLinesUsingFileReader(txt0);
Assert.assertEquals(3, lines.size());
Assert.assertEquals("Hello", lines.get(0));
Assert.assertEquals(",", lines.get(1));
Assert.assertEquals("World!", lines.get(2));
}
#Test
public void readLinesUsingFileReader1() throws IOException {
List<String> lines = readLinesUsingFileReader(txt1);
Assert.assertEquals(4, lines.size());
Assert.assertEquals("", lines.get(0));
Assert.assertEquals("", lines.get(1));
Assert.assertEquals("", lines.get(2));
Assert.assertEquals("", lines.get(3));
}
#Test
public void readLinesUsingFileReader2() throws IOException {
List<String> lines = readLinesUsingFileReader(txt2);
Assert.assertTrue(lines.isEmpty());
}
If you have the new line character
public static void main(String[]args) throws IOException{
FileReader in = new FileReader("D:/test.txt");
char [] a = new char[50];
in.read(a); // reads the content to the array
for(char c : a)
System.out.print(c); //prints the characters one by one
in.close();
}
It will print
Hello
Java
I solved the above problem by using this code
public class Reader
{
public static void main(String[]args) throws IOException{
try (FileReader in = new FileReader("D:/test.txt")) {
int line = in.read();
while(line!=-1)
{
System.out.print((char)line);
line = in.read();
} }
}
}
But there is one more question if I write for loop instead of while like this
for(int i=0;i<line;i++)
It prints only first line.Could anybody tell me why?
Reader.read() returns int code of single char or -1 if end of the file is reached:
http://docs.oracle.com/javase/7/docs/api/java/io/Reader.html#read()
So, read the file char by char and check LF (Line feed, '\n', 0x0A, 10 in decimal), CR (Carriage return, '\r', 0x0D, 13 in decimal)and end-of-line codes.
Note: Windows OS uses 2 chars to encode the end of line: "\r\n". The most of others including Linux, MacOS, etc. use only "\n" to encode the end of line.
final StringBuilder line = new StringBuilder(); // line buffer
try (FileReader in = new FileReader("D:/test.txt")) {
int chAr, prevChar = 0x0A; // chAr - just read char, prevChar - previously read char
while (prevChar != -1) { // until the last read char is EOF
chAr = in.read(); // read int code of the next char
switch (chAr) {
case 0x0D: // CR - just
break; // skip
case -1: // EOF
if (prevChar == 0x0A) {
break; // no need a new line if EOF goes right after LF
// or no any chars were read before (prevChar isn't
// changed from its initial 0x0A)
}
case 0x0A: // or LF
System.out.println("line:" + line.toString()); // get string from the line buffer
line.setLength(0); // cleanup the line buffer
break;
default: // if any other char code is read
line.append((char) chAr); // append to the line buffer
}
prevChar = chAr; // remember the current char as previous one for the next iteration
}
}

BufferedReader readLine() issue: detecting end of file and empty return lines

I want my program to do something when it finds the end of a file (EOF) at the end of the last line of text, and something else when the EOF is at the empty line AFTER that last line of text. Unfortunately, BufferedReader seems to consider both cases equal.
For example, this is my code to read the lines to the end of the file:
FileReader fr = new FileReader("file.txt");
BufferedReader br = new BufferedReader(fr);
String line;
while((line = br.readLine()) != null) {
if (line.equals("")) {
System.out.println("Found an empty line at end of file.");
}
}
If file.txt contained this, it wouldn't print:
line of text 1
another line 2//cursor
This wouldn't print either:
line of text 1
another line 2
//cursor
However, this will:
line of text 1
another line 2
//cursor
What reader can I use to differentiate the first two cases?
You can use BufferedReader.read(char[] cbuf, int off, int len) method. When end of file is reached, return value -1, you can check if the last buffer read ended with a line separator.
Admittedly, the code would be more complicated as it will have to manage the construction of lines from the read char[] buffers.
You'll have to use read rather than readLine and handle end-of-line detection yourself. readLine considers \n, \r, and EOF all to be line terminators, and doesn't include the terminator in what it returns, so you can't differentiate on the basis of the returned string.
public ArrayList<String> readFile(String inputFilename) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(inputFilename));
ArrayList<String> lines = new ArrayList<>();
String currentLine = "";
int currentCharacter = br.read();
int lastCharacter = -1;
// Loop through each character read.
while (currentCharacter != -1) {
// Skip carriage returns.
if (currentCharacter != '\r') {
// Add the currentLine at each line feed and then reset currentLine.
if (currentCharacter == '\n') {
lines.add(currentLine);
currentLine = "";
} else {
// Add each non-line-separating character to the currentLine.
currentLine += (char) currentCharacter;
}
}
// Keep track of the last read character and continue reading the next
// character.
lastCharacter = currentCharacter;
currentCharacter = br.read();
}
br.close();
// If the currentLine is not empty, add it to the end of the ArrayList.
if (!currentLine.isEmpty()) {
lines.add(currentLine);
}
// If the last read character was a line feed, add another String to the end
// of the ArrayList.
if (lastCharacter == '\n') {
lines.add("");
}
return lines;
}
I tried reading from a BufferedReader that received its input from a socket input stream.
Everything worked fine until the last line, where the readLine() would just simply hang because the browser wouldn't send a newline terminator on post data.
This is my solution, to be able to read until the end of the input stream.
public String getLine(BufferedReader in)
{
StringBuilder builder = new StringBuilder();
try {
while(in.ready()) {
char input = (char)in.read();
/**
* This method only matches on " \r\n" as a new line indicator.
* change as needed for your own line terminators
*/
if(input == '\r') {
/** If we can read more, read one more character
* If that's a newline, we break and return.
* if not, we add the carriage return and let the
* normal program flow handle the read character
*/
if(in.ready()) {
input = (char)in.read();
if(input == '\n') {
break;
}
else {
builder.append('\r');
}
}
}
builder.append(input);
}
}
catch(IOException ex) {
System.out.println(ex.getMessage());
}
return builder.toString();
}
You can use #hmjd's solution or any other readers that can read byte by byte.
If you want to stick with reading line by line, you can use this.
boolean EOF = (currentLine = bufferedReader.readLine()) == null;
while(!EOF){
// do things that will happen no matter it is EOF or not
EOF = (currentLine = bufferedReader.readLine()) == null;
if(!EOF){
// do things that will happen no matter it is not EOF
}else{
// do things that will happen no matter it is EOF
}
}
}
Why not use
if (line.length()==0) {
System.out.println("Found an empty line.");
}
Note: this will detect a blank line anywhere in the file, not just at EOF.

How to find out which line separator BufferedReader#readLine() used to split the line?

I am reading a file via the BufferedReader
String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
String s = br.readLine();
if (s == null) break;
...
}
I need to know if the lines are separated by '\n' or '\r\n'
is there way I can find out ?
I don't want to open the FileInputStream so to scan it initially.
Ideally I would like to ask the BufferedReader since it must know.
I am happy to override the BufferedReader to hack it but I really don't want to open the filestream twice.
Thanks,
Note: the current line separator (returned by System.getProperty("line.separator") ) can not be used as the file could have been written by another app on another operating system.
To be in phase with the BufferedReader class, you may use the following method that handles \n, \r, \n\r and \r\n end line separators:
public static String retrieveLineSeparator(File file) throws IOException {
char current;
String lineSeparator = "";
FileInputStream fis = new FileInputStream(file);
try {
while (fis.available() > 0) {
current = (char) fis.read();
if ((current == '\n') || (current == '\r')) {
lineSeparator += current;
if (fis.available() > 0) {
char next = (char) fis.read();
if ((next != current)
&& ((next == '\r') || (next == '\n'))) {
lineSeparator += next;
}
}
return lineSeparator;
}
}
} finally {
if (fis!=null) {
fis.close();
}
}
return null;
}
After reading the java docs (I confess to being a pythonista), it seems that there isn't a clean way to determine the line-end encoding used in a specific file.
The best thing I can recommended is that you use BufferedReader.read() and iterate over every character in the file. Something like this:
String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
String l = "";
Char c = " ";
while (true){
c = br.read();
if not c == "\n"{
// do stuff, not sure what you want with the endl encoding
// break to return endl-free line
}
if not c == "\r"{
// do stuff, not sure what you want with the endl encoding
// break to return endl-free line
Char ctwo = ' '
ctwo = br.read();
if ctwo == "\n"{
// do extra stuff since you know that you've got a \r\n
}
}
else{
l = l + c;
}
if (l == null) break;
...
l = "";
}
BufferedReader.readLine() does not provide any means of determining what the line break was. If you need to know, you'll need to read characters in yourself and find line breaks yourself.
You may be interested in the internal LineBuffer class from Guava (as well as the public LineReader class it's used in). LineBuffer provides a callback method void handleLine(String line, String end) where end is the line break characters. You could probably base something to do what you want on that. An API might look something like public Line readLine() where Line is an object that contains both the line text and the line end.
BufferedReader does not accept FileInputStreams
No, you cannot find out the line terminator character that was used in the file being read by BufferedReader. That information is lost while reading the file.
Unfornunately all answers below are incorrect.
Edit: And yes you can always extend BufferedReader to include the additional functionality you desire.
The answer would be You can't find out what was the line ending.
I am looking for what can cause line endings in the same funcion. After looking at the BufferedReader source code, I can saz that BufferedReader.readLine ends line on '\r' or '\n' and skips leftower '\r' or '\n'. Hardcoded, does not care about settings.
If you happen to be reading this file into a Swing text component then you can just use the JTextComponent.read(...) method to load the file into the Document. Then you can use:
textComponent.getDocument().getProperty( DefaultEditorKit.EndOfLineStringProperty );
to get actual EOL string that was used in the file.
Maybe you could use Scanner instead.
You can pass regular expressions to Scanner#useDelimiter() to set custom delimiter.
String regex="(\r)?\n";
String filename=....;
Scanner scan = new Scanner(new FileInputStream(filename));
scan.useDelimiter(Pattern.compile(regex));
while (scan.hasNext()) {
String str= scan.next();
// todo
}
You could use this code below to convert BufferedReader to Scanner
new Scanner(bufferedReader);
Not sure if useful, but sometimes I need to find out the line delimiter after I've read the file already far-down the road.
In this case I use this code:
/**
* <h1> Identify which line delimiter is used in a string </h1>
*
* This is useful when processing files that were created on different operating systems.
*
* #param str - the string with the mystery line delimiter.
* #return the line delimiter for windows, {#code \r\n}, <br>
* unix/linux {#code \n} or legacy mac {#code \r} <br>
* if none can be identified, it falls back to unix {#code \n}
*/
public static String identifyLineDelimiter(String str) {
if (str.matches("(?s).*(\\r\\n).*")) { //Windows //$NON-NLS-1$
return "\r\n"; //$NON-NLS-1$
} else if (str.matches("(?s).*(\\n).*")) { //Unix/Linux //$NON-NLS-1$
return "\n"; //$NON-NLS-1$
} else if (str.matches("(?s).*(\\r).*")) { //Legacy mac os 9. Newer OS X use \n //$NON-NLS-1$
return "\r"; //$NON-NLS-1$
} else {
return "\n"; //fallback onto '\n' if nothing matches. //$NON-NLS-1$
}
}
If you are using groovy, you can simply do:
def lineSeparator = new File('path/to/file').text.contains('\r\n') ? '\r\n' : '\n'

Categories