I have an InputStream that would read from a text file. I noticed that the input stream doesn't read from a blank next line.
Sample text file:
[This is
A test file
Here.]
The code:
while ((str = br.readLine())!= null) {
System.out.println(str);
}
Some text files would have multiple break lines in between. How do I get the input stream to accept break lines ?
As can be seen from the sample text file, '[This is' is followed by an empty line and then followed subsequently by 'A test file'. How do I read the empty line in between the two sets of strings ? (That is my definition of line break / break line)
Do you mean?
while ((str = br.readLine())!= null && str.trim().length()>0) {
System.out.println(str);
}
This gives you all the non blank lines.
BufferedReader will read all the lines in the file regardless of whether they're blank or not. It will look for line breaks: LF+CR on Windows and LF on GNU/Linux. According to the BufferedReader documentation:
A line is considered to be terminated
by any one of a line feed ('\n'), a
carriage return ('\r'), or a carriage
return followed immediately by a
linefeed.
So, it depends on what your text file really looks like. Does it really have carriage-returns and line-feeds between the lines or is it just displaying that way? You can find this out by looking at the file in a Hex editor (LF is 0x0A and CR is 0x0D). If so, then BufferedReader should be giving you those blank lines.
Sorry for the mistake, it's a BufferedReader taking in a InputStream. The BufferedReader did read every line including the next line / break line / line break. The problem was with my algorithm I wrote that simply missed the next line / break line / line break. Sorry for the hassle and trouble.
Thanks for all the answers.
Related
Easiest demonstrated with an example:
String test = "salut ð\u009F\u0098\u0085 test";
Scanner scan = new Scanner(test);
System.out.println("1:" + scan.nextLine());
System.out.println("2:" + scan.nextLine());
This was a string in user input so unfortunately I'm not 100% sure what that unicode is, but if I recall correctly, it was an emoji (I saw the message when it was sent).
The output is:
1:salut ð
2: test
My expected output is just 1 line (i.e. the example code should give a NoSuchElementException because the second nextLine() should fail.). Why is it parsing as two lines? What is a potential workaround?
When I open the file in a text editor it correctly does not treat that unicode as a new line.
Why is it parsing as two lines?
Although this is an uncommon codepoint, the unicode name of U+0085 is NEXT LINE [NEL], I guess it could be considered a new line character.
But is there a reason BufferedReader and text editors like Sublime Text don't parse it as an actual new line, while Scanner does?
If you look at the respective documentations of Scanner and BufferedReader:
Scanner.nextLine:
Advances this scanner past the current line and returns the input that was skipped. This method returns the rest of the current line, excluding any line separator at the end. The position is set to the beginning of the next line.
Since this method continues to search through the input looking for a line separator...
BufferedReader.readLine:
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Scanner.nextLine just says "line separator" a very vague term (it certainly doesn't refer to the Unicode category "Line Separators", which only has one codepoint), whereas the BufferedReader.readLine documentation states exactly what a line is.
Considering how Scanner also handles localised number formats and stuff, my guess is that it is designed to be a "smarter" class than BufferedReader.
Looking at the source code of my version of the JDK, Scanner considers the following strings "line separators":
\r\n
\n
\r
\u2028
\u2029
\u0085
The reason why \u0085 is considered a new line character is apparently related to XML parsing.
Well I know this is a pretty basic question but somehow I am not able to figure it out.
I have an input similar to say:
line1
line2
line3
line4
All the lines have a new line character at the end except line4 i.e. I have pressed ENTER after each line except line4. Now if I provide this as an input to the BufferedReader, it reads out only the first 3 lines and skips the last line.
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line;
while((line=br.readLine())!=null){
System.out.println(line);
}
This is the code I am trying to use. I dont think there is any problem with the code and no new line at the last line is causing the problem.
Can someone help me with this.
From BufferedReader's readLine() javadoc:
Reads a line of text. A line is considered to be terminated by any one
of a line feed ('\n'), a carriage return ('\r'), or a carriage return
followed immediately by a linefeed.
Meaning, if you want to "read" a line, it has to have one of those characters mentioned above.
This question already has answers here:
How do I get a platform-dependent new line character?
(10 answers)
Closed 7 years ago.
My program takes a input of a text file that has words each separated by a newline and my program takes it and deals with the data, and then I am required to output to a new file whilst keeping the console output.
Now I am wondering why when I append "\n" to my stringBuilder that it prints it out as it were to have a new line in the console, but in the file output, it doesn't take it as a new line and just puts all the words in one line.
When I use newLine, then only does it give a new line in both my console output and my output file. Why is that? What does (String)System.getProperty("line.separator") do that causes this?
String newLine = (String)System.getProperty("line.separator");
try{
BufferedReader fileIn = new BufferedReader(new FileReader(fileName));
stringBuilder.append(newLine);
while((s = fileIn.readLine()) != null){
stringBuilder.append(s);
stringBuilder.append(newLine);//using newLine,
}
String a = stringBuilder.toString();
if(s== null){
fileIn.close();
}
Because on some systems (Linux/Unix) a new line is defined as \n while on others (Windows) it is \r\n. Depending on the software reading the text, it may chose to adhere to this or be more "forgiving" recognizing either or even \r individually.
Relevant Wikipedia text (https://en.wikipedia.org/wiki/Newline):
Systems based on ASCII or a compatible character set use either LF
(Line feed, '\n', 0x0A, 10 in decimal) or CR (Carriage return, '\r',
0x0D, 13 in decimal) individually, or CR followed by LF (CR+LF,
'\r\n', 0x0D0A)
This is also why you can retrieve the system-defined line separater from the System class as you did, instead of, for example, having it be some constant in the String class.
System.getProperty("line.separator") is different from "\n" in that the former returns the OS line separator (not always \n). \n is just a line feed, and when you open your output file in a program that does not interpret \n as a new line (say, Notepad on Windows) you won't see that new line.
my file contains this string:
a
b
c
now I want to read it and split it with empty line so I have this:
text.split("\n\n"); where text is output of file
problem is that this doesnt work. When I convert new line to byte I see that "\n\n" is represented as 10 10 but new line in my file is represented by 10 13 10 13. So how I can split my file ?
Escape Description ASCII-Value
\n New Line Feed (LF) 10
\r Carriage Return (CR) 13
So you need to try string.split("\n\r") in your case.
Edit
If you want to split by empty line, try \n\r\n\r. Or you can use .readLine() to read your file, and skip all empty lines.
Are you sure it's 10 13 10 13? It always should be 13 10...
And, you should not depend on line.separator too much. Because if you are processing some files from *nix platform, it's \n, vice versa. And even on Windows, some editors use \n as the new line character. So I suggest you to use some high level methods or use string.replaceAll("\r\n", "\n") to normalize your input.
Keep in mind, sometimes you have to use:
System.getProperty("line.separator");
to get the line separator, if you want to make it platform independent. You can also use BufferedWriter's newLine() method, that takes care of that automatically.
Try using:
text.split("\n\r");
Why are you splitting on \n\n?
You should be splitting on \r\n because that's what the file lines are separated by.
Try to use regular expressions, something like:
text.split("\\W+");
text.split("\\s+");
LF: Line Feed, U+000A
CR: Carriage Return, U+000D
so you need to try to use
"string".split("\r\n");
Use scanner object, instead of worrying about chars/bytes.
One Solution is to Split using "\n" and neglect empty Strings
List<String> lines = text.split("\n");
for(String line : lines) {
line = line.trim();
if(line != "") {
System.out.println(line);
}
}
I've been having lots of trouble trying to get either a scanner or a buffered reader to try and detect a blank line. For example if I have a file that contains:
there
cat
dog
(BLANK LINE)
If I do this:
while( scan.hasNextLine() )
{
String line = scan.nextLine();
...
...
}
The scanner doesn't pick up the blank line. I tried to use a buffered reader also but I run into this issue. Is there some way the scanner can just return a "" whenever it finds a blank line like that? Cheers
Your input has as many lines as it has \n characters. Given the input
"there\ncat\ndog\n"
the next-lines will be correctly divided as
"there\n"
"cat\n"
"dog\n"
(In other words, there is no fourth blank line, since it is not terminated by a \n.)
Put differently, after the "dog\n" has been read, the scanner (or buffered reader for that matter) has reached EOF and there's not even an empty line to return. (Note that when the lines are returned, the new-line character is stripped off.)
So, since this is the expected behavior, I don't know what the easiest fix is. I suspect that the best way to solve this is simply to append a \n to the input, so that the loop runs an extra iteration.