I have a text file with a mix of newlines (CR/LF or \r\f or "\n") and formfeeds (LF or \f) in a tab \t delimited file. The newlines appear as the expected, "\n" but the formfeeds are also used as internal field delimiters. Example:
COL_1 COL_2 COL_3 COL_4
1 A\fB C\fD 2
Using Java I was able to remove the formfeeds only after I set line.separator to \r - for CR/LF or \r\f and then reading in the file using the FileReader.read() checking for '\n':
private void fixMe() throws Exception{
FileReader in = new FileReader("C:\\somefile.txt");
FileReader out = new FileReader("C:\\someotherfile.txt");
Syetem.setProperty("line.separator","\r");
try {
int c;
while (( c = in.read()) != -1 ) {
if ( c != '\n' ) {
out.write(c);
}
}
}
...
It appears that in.read has a default setting to read "\n" as two characters. I can remove \f but now I'll have to write another method to change \r to a "\n" and reset line.separator as part of the method. Is there a better way to do this? I want to use Scanner, but the solution points at resetting the line.separator setting again which I want to avoid.
Better way to read all file content, then remove "\n and \r\n and \f", after save where you want.
See example:
String content = new String(Files.readAllBytes(Paths.get("path-to-file")));
String processedContent = content.replaceAll("\\n|\\r\\n|\\f", "");
According to your question it seems like you want to skip Line Feed '\f' in the file without skipping if it is CRLF \r\f, so keeping track of last character read might solve your issue.
private void fixMe() throws Exception{
FileReader in = new FileReader("C:\\somefile.txt");
FileReader out = new FileReader("C:\\someotherfile.txt");
//Character 10 'LF' or '\f' and 13 'CR' or '\r'
try {
int c;
int prevCharRead = 0;
while ((c = in.read()) != -1 ) {
if(c==10 && prevCharRead!=13){
//it's a line feed LF '\f' without the occurrence of CR '\r' before it, skip it or implement whatever logic you want.
}else
out.write(c);
prevCharRead = c;
}
}
...
Related
I am reading a file with comma separated values which when split into an array will have 10 values for each line . I expected the file to have line breaks so that
line = bReader.readLine()
will give me each line. But my file doesnt have a line break. Instead after the first set of values there are lots of spaces(465 to be precise) and then the next line begins.
So my above code of readLine() is reading the entire file in one go as there are no lined breaks. Please suggest how best to efficiently tackle this scenario.
One way is to replace String with 465 spaces in your text with new line character "\n" before iterating it for reading.
I second Ninan's answer: replace the 465 spaces with a newline, then run the function you were planning on running earlier.
For aesthetics and readability I would suggest using Regex's Pattern to replace the spaces instead of a long unreadable String.replace(" ").
Your code could like below, but replace 6 with 465:
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
String content = "DOG,CAT MOUSE,CHEESE";
Pattern p = Pattern.compile("[ ]{6}",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
String newString = p.matcher(content).replaceAll("\n");
System.out.println(newString);
}
My suggestion is read file f1.txt and write to anther file f2.txt by removing all empty lines and spaces then read f2.txt something like
FileReader fr = new FileReader("f1.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("f2.txt");
String line;
while((line = br.readLine()) != null)
{
line = line.trim(); // remove leading and trailing whitespace
if (!line.equals("")) // don't write out blank lines
{
fw.write(line, 0, line.length());
}
}
Then try using your code.
You might create your own subclass of a FilterInputStream or a PushbackInputStream and pass that to an InputStreamReader. One overrides int read().
Such a class unfortunately needs a bit of typing. (A nice excercise so to say.)
private static final int NO_CHAR = -2;
private boolean fromCache;
private int cachedSpaces;
private int cachedNonSpaceChar = NO_CHAR;
int read() throws IOException {
if (fromCache) {
if (cachecSpaces > 0) ...
if (cachedNonSpaceChar != NO_CHAR) ...
...
}
int ch = super.read();
if (ch != -1) {
...
}
return ch;
}
The idea is to cache spaces till either a nonspace char, and in read() either take from the cache, return \n instead, call super.read() when not from cache, recursive read when space.
My understanding is that you have a flat CSV file without proper line break, which supposed to have 10 values on each line.
Updated:
1. (Recommended) You can use Scanner class with useDelimiter to parse csv effectively, assuming you are trying to store 10 values from a line:
public static void parseCsvWithScanner() throws IOException {
Scanner scanner = new Scanner(new File("test.csv"));
// set your delimiter for scanner, "," for csv
scanner.useDelimiter(",");
// storing 10 values as a "line"
int LINE_LIMIT = 10;
// implement your own data structure to store each value of CSV
int[] tempLineArray = new int[LINE_LIMIT];
int lineBreakCount = 0;
while(scanner.hasNext()) {
// trim start and end spaces if there is any
String temp = scanner.next().trim();
tempLineArray[lineBreakCount++] = Integer.parseInt(temp);
if (lineBreakCount == LINE_LIMIT) {
// replace your own logic for handling the full array
for(int i=0; i<tempLineArray.length; i++) {
System.out.print(tempLineArray[i]);
} // end replace
// resetting array and counter
tempLineArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
}
scanner.close();
}
Or use the BufferedReader.
You might not need the ArrayList to store all values if there is memory issue by replacing your own logic.
public static void parseCsv() throws IOException {
BufferedReader br = new BufferedReader(new FileReader(file));
// your delimiter
char TOKEN = ',';
// your requirement of storing 10 values for each "line"
int LINE_LIMIT = 10;
// tmp for storing from BufferedReader.read()
int tmp;
// a counter for line break
int lineBreakCount = 0;
// array for storing 10 values, assuming the values of CSV are integers
int[] tempArray = new int[LINE_LIMIT];
// storing tempArray of each line to ArrayList
ArrayList<int[]> lineList = new ArrayList<>();
StringBuilder sb = new StringBuilder();
while((tmp = br.read()) != -1) {
if ((char)tmp == TOKEN) {
if (lineBreakCount == LINE_LIMIT) {
// your logic to handle the current "line" here.
lineList.add(tempArray);
// new "line"
tempArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
// storing current value from buffer with trim of spaces
tempArray[lineBreakCount] =
Integer.parseInt(sb.toString().trim());
lineBreakCount++;
// clear the buffer
sb.delete(0, sb.length());
}
else {
// add current char from BufferedReader if not delimiter
sb.append((char)tmp);
}
}
br.close();
}
I have the following code:
public class Reader {
public static void main(String[] args) throws IOException {
try (FileReader in = new FileReader("D:/test.txt")) {
// BufferedReader br = new BufferedReader(in);
int line = in .read();
for (int i = 0; i < line; i++) {
//System.out.println(line);
System.out.println((char) line);
line = in .read();
}
}
}
}
and a file Test.txt with the content:
Hello
Java
When I run above code it only reads Hello. I would like to read multiple lines using FileReader only. I don't want to use BufferedReader or InputStreamReader etc. Is that possible?
I don't think this version of the code prints "Hello".
You are calling:
int line = in.read();
What does this do? Look in the Javadocs for Reader:
public int read()
throws IOException
Reads a single character. This method will block until a character is available, an I/O error occurs, or the end
of the stream is reached.
(emphasis mine)
Your code reads the 'H' from 'Hello', which is 72 in ASCII.
Then it goes into your loop, with line==72, so it goes into the loop:
for(int i=0;i<line;i++)
... making the decision "is 0 less than 72? Yes, so I'll go into the loop block".
Then each time it reads a character the value of line changes to another integer, and each time loop goes around i increments. So the loop says "Keep going for as long as the ASCII value of the character is greater than the number of iterations I've counted".
... and each time it goes around, it prints that character on a line of its own.
As it happens, for your input, it reads end-of-file (-1), and as -1 < i, the loop continue condition is not met.
But for longer inputs it stop on the first 'a' after the 97th character, or the first 'b' after the 98th character, and so on (because ASCII 'a' is 97, etc.)
H
e
l
l
o
J
a
v
a
This isn't what you want:
You don't want your loop to repeat until i >= "the character I just read". You want it to repeat until in.read() returns -1. You have probably been taught how to loop until a condition is met.
You don't want to println() each character, since that adds newlines you don't want. Use print().
You should also look at the Reader.read(byte[] buffer) method, and see if you can write the code to work in bigger chunks.
Two patterns you'll use over and over again in your programming career are:
Type x = getSomehow();
while(someCondition(x)) {
doSomethingWith(x);
x = getSomehow();
}
... and ...
Type x = value_of_x_which_meets_condition;
while(someCondition(x)) {
x = getSomehow();
doSomethingWith(x);
}
See if you can construct something with FileReader and the value you get from it, filling in the "somehows".
Reading file character by character without any buffering stream is extremely ineffective. I would probably wrap FileReader in some BufferedReader or simply used Scanner to read condent of file, but if you absolutely want/need/have to use only FileReader then you can try with
int line = in.read();
while (line != -1) {
System.out.print((char) line);
line = in.read();
}
instead of your for (int i = 0; i < line; i++) {...} loop.
Read carefully slims answer. In short: reading condition shouldn't care if number of characters you read is less then numeric representation of currently read character (i < line). Like in case of
My name
is
not important now
This file has few characters which you normally will not see like \r and \n and in reality it looks like
My name\r\n
\r\n
is\r\n
\r\n
not important now
where numeric representation of \r is 10, so after you read My name\r\n (which is 9 characters because \r and \n are single character representing line separator) your i will become 10 and since next character you will try to read is \r which is also represented by 10 your condition i<line will fail (10<10 is not true).
So instead of checking i<line you should check if read value is not EoF (End of File, or End of Stream in out case) which is represented by -1 as specified in read method documentation so your condition should look like line != -1. And because you don't need i just use while loop here.
Returns:
The character read, or -1 if the end of the stream has been reached
You will have to read the content char by char and parse for a new line sequence.
A new line sequence can be any of the following:
a single cariage return '\r'
a single line feed '\n'
a carriage return followed by a line feed "\r\n"
EDIT
You could try the following:
public List<String> readLinesUsingFileReader(String filename) throws IOException {
List<String> lines = null;
try (FileReader fileReader = new FileReader(filename)) {
lines = readLines(fileReader);
}
return lines;
}
private List<String> readLines(FileReader fileReader) throws IOException {
List<String> lines = new ArrayList<>();
boolean newLine = false;
int c, p = 0;
StringBuilder line = new StringBuilder();
while(-1 != (c = fileReader.read())) {
if(c == '\n' && p != '\r') {
newLine = true;
} else if(c == '\r') {
newLine = true;
} else {
if(c != '\n' && c != '\r') {
line.append((char) c);
}
}
if(newLine) {
lines.add(line.toString());
line = new StringBuilder();
newLine = false;
}
p = c;
}
if(line.length() > 0) {
lines.add(line.toString());
}
return lines;
}
Note that the code above reads the whole file into a List, this might not be well suited for large files! You may want in such a case to implement an approach which uses streaming, i.e. read one line at a time, for example String readNextLine(FileReader fileReader) { ... }.
Some basic tests:
Create test files to read
private final static String txt0 = "testnl0.txt";
private final static String txt1 = "testnl1.txt";
private final static String txt2 = "testnl2.txt";
#BeforeClass
public static void genTestFile() throws IOException {
try (OutputStream os = new FileOutputStream(txt0)) {
os0.write((
"Hello\n" +
",\r\n" +
"World!" +
"").getBytes());
}
try (OutputStream os = new FileOutputStream(txt1)) {
os.write((
"\n" +
"\r\r" +
"\r\n" +
"").getBytes());
}
try (OutputStream os = new FileOutputStream(txt2)) {
os.write((
"").getBytes());
}
}
Test using the created files
#Test
public void readLinesUsingFileReader0() throws IOException {
List<String> lines = readLinesUsingFileReader(txt0);
Assert.assertEquals(3, lines.size());
Assert.assertEquals("Hello", lines.get(0));
Assert.assertEquals(",", lines.get(1));
Assert.assertEquals("World!", lines.get(2));
}
#Test
public void readLinesUsingFileReader1() throws IOException {
List<String> lines = readLinesUsingFileReader(txt1);
Assert.assertEquals(4, lines.size());
Assert.assertEquals("", lines.get(0));
Assert.assertEquals("", lines.get(1));
Assert.assertEquals("", lines.get(2));
Assert.assertEquals("", lines.get(3));
}
#Test
public void readLinesUsingFileReader2() throws IOException {
List<String> lines = readLinesUsingFileReader(txt2);
Assert.assertTrue(lines.isEmpty());
}
If you have the new line character
public static void main(String[]args) throws IOException{
FileReader in = new FileReader("D:/test.txt");
char [] a = new char[50];
in.read(a); // reads the content to the array
for(char c : a)
System.out.print(c); //prints the characters one by one
in.close();
}
It will print
Hello
Java
I solved the above problem by using this code
public class Reader
{
public static void main(String[]args) throws IOException{
try (FileReader in = new FileReader("D:/test.txt")) {
int line = in.read();
while(line!=-1)
{
System.out.print((char)line);
line = in.read();
} }
}
}
But there is one more question if I write for loop instead of while like this
for(int i=0;i<line;i++)
It prints only first line.Could anybody tell me why?
Reader.read() returns int code of single char or -1 if end of the file is reached:
http://docs.oracle.com/javase/7/docs/api/java/io/Reader.html#read()
So, read the file char by char and check LF (Line feed, '\n', 0x0A, 10 in decimal), CR (Carriage return, '\r', 0x0D, 13 in decimal)and end-of-line codes.
Note: Windows OS uses 2 chars to encode the end of line: "\r\n". The most of others including Linux, MacOS, etc. use only "\n" to encode the end of line.
final StringBuilder line = new StringBuilder(); // line buffer
try (FileReader in = new FileReader("D:/test.txt")) {
int chAr, prevChar = 0x0A; // chAr - just read char, prevChar - previously read char
while (prevChar != -1) { // until the last read char is EOF
chAr = in.read(); // read int code of the next char
switch (chAr) {
case 0x0D: // CR - just
break; // skip
case -1: // EOF
if (prevChar == 0x0A) {
break; // no need a new line if EOF goes right after LF
// or no any chars were read before (prevChar isn't
// changed from its initial 0x0A)
}
case 0x0A: // or LF
System.out.println("line:" + line.toString()); // get string from the line buffer
line.setLength(0); // cleanup the line buffer
break;
default: // if any other char code is read
line.append((char) chAr); // append to the line buffer
}
prevChar = chAr; // remember the current char as previous one for the next iteration
}
}
I am trying to read a text file which has -
hello James!
How are you today!
I want to read the each character in the string till i find EOL character.As i am using windows where i have /n/r which represents EOL character.How can i write a condition to go through all the characters of the string and print them one by one till it reaches EOL(/n/r).
int readedValue;
do
{
while((readedValue = bufferReader.read()) != 10)
{
//readedValue = bufferReader.read();
char ch = (char) readedValue;
System.out.print(ch);
}
}
while ((readedValue = bufferReader.read()) != -1);
when i read the file now , i get out put as hello James!ow are you today!
I am not getting 'H'ow in How. What can i alter this to get the complete text?
As people have noted, the readline() method reads to the next line separator, and returns the line with the separator removed. So your tests for '\n' and '\r' in line cannot possibly evaluate to true.
But you can easily add an extra end-of-line when you output the line string1.
1 - that is, unless you actually need to preserve the exact same end-of-line sequence characters as in the input stream.
You ask:
Instead of using readline(), Is there any way i can use buffer reader to read each character and print them?
Yea, sure. The read() method returns either one character or -1 to indicate EOF. So:
int ch = br.read();
while (ch != -1) {
System.out.print((char) ch);
ch = br.read();
}
You can use something like that:
while((line=input.readLine())!=null) {
// do something
}
If you want to read char by char, you can use this:
int readedValue;
while ((readedValue = reader.read()) != -1) {
char ch = (char) readedValue;
// do something
}
Here is an example (with a string instead a file) for your new problem:
String line;
int readedValue;
String s = "hello James!\n\rHow are you today!";
StringReader input = new StringReader(s);
BufferedReader lineReader= new BufferedReader (input);
while((line=lineReader.readLine())!=null) {
StringReader input2 = new StringReader(line);
BufferedReader charReader= new BufferedReader (input2);
while((readedValue = charReader.read()) != -1) {
char ch = (char) readedValue;
System.out.print(ch);
}
}
Your problem is about magic numbers.
your while will enter into an infinite loop in the case charAt(21)!='\n' && charAt(22)!='\r'
those two integers shall be increased inside the loop.
charAt(i)!='\n' && charAt(i+1)!='\r'
::inside loop
i++
I am reading a file via the BufferedReader
String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
String s = br.readLine();
if (s == null) break;
...
}
I need to know if the lines are separated by '\n' or '\r\n'
is there way I can find out ?
I don't want to open the FileInputStream so to scan it initially.
Ideally I would like to ask the BufferedReader since it must know.
I am happy to override the BufferedReader to hack it but I really don't want to open the filestream twice.
Thanks,
Note: the current line separator (returned by System.getProperty("line.separator") ) can not be used as the file could have been written by another app on another operating system.
To be in phase with the BufferedReader class, you may use the following method that handles \n, \r, \n\r and \r\n end line separators:
public static String retrieveLineSeparator(File file) throws IOException {
char current;
String lineSeparator = "";
FileInputStream fis = new FileInputStream(file);
try {
while (fis.available() > 0) {
current = (char) fis.read();
if ((current == '\n') || (current == '\r')) {
lineSeparator += current;
if (fis.available() > 0) {
char next = (char) fis.read();
if ((next != current)
&& ((next == '\r') || (next == '\n'))) {
lineSeparator += next;
}
}
return lineSeparator;
}
}
} finally {
if (fis!=null) {
fis.close();
}
}
return null;
}
After reading the java docs (I confess to being a pythonista), it seems that there isn't a clean way to determine the line-end encoding used in a specific file.
The best thing I can recommended is that you use BufferedReader.read() and iterate over every character in the file. Something like this:
String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
String l = "";
Char c = " ";
while (true){
c = br.read();
if not c == "\n"{
// do stuff, not sure what you want with the endl encoding
// break to return endl-free line
}
if not c == "\r"{
// do stuff, not sure what you want with the endl encoding
// break to return endl-free line
Char ctwo = ' '
ctwo = br.read();
if ctwo == "\n"{
// do extra stuff since you know that you've got a \r\n
}
}
else{
l = l + c;
}
if (l == null) break;
...
l = "";
}
BufferedReader.readLine() does not provide any means of determining what the line break was. If you need to know, you'll need to read characters in yourself and find line breaks yourself.
You may be interested in the internal LineBuffer class from Guava (as well as the public LineReader class it's used in). LineBuffer provides a callback method void handleLine(String line, String end) where end is the line break characters. You could probably base something to do what you want on that. An API might look something like public Line readLine() where Line is an object that contains both the line text and the line end.
BufferedReader does not accept FileInputStreams
No, you cannot find out the line terminator character that was used in the file being read by BufferedReader. That information is lost while reading the file.
Unfornunately all answers below are incorrect.
Edit: And yes you can always extend BufferedReader to include the additional functionality you desire.
The answer would be You can't find out what was the line ending.
I am looking for what can cause line endings in the same funcion. After looking at the BufferedReader source code, I can saz that BufferedReader.readLine ends line on '\r' or '\n' and skips leftower '\r' or '\n'. Hardcoded, does not care about settings.
If you happen to be reading this file into a Swing text component then you can just use the JTextComponent.read(...) method to load the file into the Document. Then you can use:
textComponent.getDocument().getProperty( DefaultEditorKit.EndOfLineStringProperty );
to get actual EOL string that was used in the file.
Maybe you could use Scanner instead.
You can pass regular expressions to Scanner#useDelimiter() to set custom delimiter.
String regex="(\r)?\n";
String filename=....;
Scanner scan = new Scanner(new FileInputStream(filename));
scan.useDelimiter(Pattern.compile(regex));
while (scan.hasNext()) {
String str= scan.next();
// todo
}
You could use this code below to convert BufferedReader to Scanner
new Scanner(bufferedReader);
Not sure if useful, but sometimes I need to find out the line delimiter after I've read the file already far-down the road.
In this case I use this code:
/**
* <h1> Identify which line delimiter is used in a string </h1>
*
* This is useful when processing files that were created on different operating systems.
*
* #param str - the string with the mystery line delimiter.
* #return the line delimiter for windows, {#code \r\n}, <br>
* unix/linux {#code \n} or legacy mac {#code \r} <br>
* if none can be identified, it falls back to unix {#code \n}
*/
public static String identifyLineDelimiter(String str) {
if (str.matches("(?s).*(\\r\\n).*")) { //Windows //$NON-NLS-1$
return "\r\n"; //$NON-NLS-1$
} else if (str.matches("(?s).*(\\n).*")) { //Unix/Linux //$NON-NLS-1$
return "\n"; //$NON-NLS-1$
} else if (str.matches("(?s).*(\\r).*")) { //Legacy mac os 9. Newer OS X use \n //$NON-NLS-1$
return "\r"; //$NON-NLS-1$
} else {
return "\n"; //fallback onto '\n' if nothing matches. //$NON-NLS-1$
}
}
If you are using groovy, you can simply do:
def lineSeparator = new File('path/to/file').text.contains('\r\n') ? '\r\n' : '\n'
I have the string "MO""RET" gets stored in items[1] array after the split command. After it get's stored I do a replaceall on this string and it replaces all the double quotes.
But I want it to be stored as MO"RET. How do i do it. In the csv file from which i process using split command Double quotes within the contents of a Text field are repeated (Example: This account is a ""large"" one"). So i want retain the one of the two quotes in the middle of string if it get's repeated and ignore the end quotes if present . How can i do it?
String items[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
items[1] has "MO""RET"
String recordType = items[1].replaceAll("\"","");
After this recordType has MORET I want it to have MO"RET
Don't use regex to split a CSV line. This is asking for trouble ;) Just parse it character-by-character. Here's an example:
public static List<List<String>> parseCsv(InputStream input, char separator) throws IOException {
BufferedReader reader = null;
List<List<String>> csv = new ArrayList<List<String>>();
try {
reader = new BufferedReader(new InputStreamReader(input, "UTF-8"));
for (String record; (record = reader.readLine()) != null;) {
boolean quoted = false;
StringBuilder fieldBuilder = new StringBuilder();
List<String> fields = new ArrayList<String>();
for (int i = 0; i < record.length(); i++) {
char c = record.charAt(i);
fieldBuilder.append(c);
if (c == '"') {
quoted = !quoted;
}
if ((!quoted && c == separator) || i + 1 == record.length()) {
fields.add(fieldBuilder.toString().replaceAll(separator + "$", "")
.replaceAll("^\"|\"$", "").replace("\"\"", "\"").trim());
fieldBuilder = new StringBuilder();
}
if (c == separator && i + 1 == record.length()) {
fields.add("");
}
}
csv.add(fields);
}
} finally {
if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
}
return csv;
}
Yes, there's little regex involved, but it only trims off ending separator and surrounding quotes of a single field.
You can however also grab any 3rd party Java CSV API.
How about:
String recordType = items[1].replaceAll( "\"\"", "\"" );
I prefer you to use replace instead of replaceAll.
replaceAll uses REGEX as the first argument.
The requirement is to replace two continues QUOTES with one QUOTE
String recordType = items[1].replace( "\"\"", "\"" );
To see the difference between replace and replaceAll , execute bellow code
recordType = items[1].replace( "$$", "$" );
recordType = items[1].replaceAll( "$$", "$" );
Here you can use the regular expression.
recordType = items[1].replaceAll( "\\B\"", "" );
recordType = recordType.replaceAll( "\"\\B", "" );
First statement replace the quotes in the beginning of the word with empty character.
Second statement replace the quotes in the end of the word with empty character.