I'm having issues with BufferedWriter/BufferedReader.
Basically, whenever I try to read a file with BufferedReader.readLine() it reads everything up to the new line character (i.e. The new line character is omitted).
For instance:
String temp;
File f = new File(path.toURI());
BufferedReader reader = new BufferedReader(new FileReader(f));
while ((temp = reader.readLine()) != null) {
//Work with temp
}
I know about the existence of BufferedReader#newLine(), but it appears that it does not exactly get the newline (delimiter?) that was previously omitted.
From my understanding if I were to readline the following:
abcd\n
efgh\r\n
ijkl\r
It will return:
abcd\n
efgh\n
ijkl\n
What I am asking is, is there any class that is able to read characters without omitting them like BufferedInputStream, while retaining the ability to read line like BufferedReader#readLine()
\n is a linux/unix line ending while \r\n is windows line ending.
if there is such a file that has both line ending it should be reformatted.
My suggestion would be if you ever come across such file, just reformat it to either use \n or \r\n (depending on your OS not that it matter nowadays). it makes your life easier so the life of the next person that is going to use it next.
Alternatively (please don't use this :/) you can override BufferReader.readLine(Boolean b) to this:
String readLine(boolean ignoreLF) throws IOException {
StringBuffer s = null;
int startChar;
synchronized (lock) {
ensureOpen();
boolean omitLF = ignoreLF || skipLF;
bufferLoop:
for (;;) {
if (nextChar >= nChars)
fill();
if (nextChar >= nChars) { /* EOF */
if (s != null && s.length() > 0){
if(skipLF=='\r'){
return s.toString() + "\r\n";
}else{
return s.toString() + "\n";
}
}
else
return null;
}
boolean eol = false;
char c = 0;
int i;
/* Skip a leftover '\n', if necessary */
if (omitLF && (cb[nextChar] == '\n'))
nextChar++;
skipLF = false;
omitLF = false;
charLoop:
for (i = nextChar; i < nChars; i++) {
c = cb[i];
if ((c == '\n') || (c == '\r')) {
eol = true;
break charLoop;
}
}
startChar = nextChar;
nextChar = i;
if (eol) {
String str;
if (s == null) {
str = new String(cb, startChar, i - startChar);
} else {
s.append(cb, startChar, i - startChar);
str = s.toString();
}
nextChar++;
if (c == '\r') {
skipLF = true;
}
if(skipLF=='\r'){
return str + "\r\n";
}else{
return str + "\n";
}
}
if (s == null)
s = new StringBuffer(defaultExpectedLineLength);
s.append(cb, startChar, i - startChar);
}
}
}
SOURCE CODE edited from:
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/BufferedReader.java#BufferedReader.readLine%28boolean%29
It probably won't be too much trouble to extend BufferedReader to include a \n or \r in the return from readLine(). In fact, the package-protected readLine(boolean ignoreLF) function is all you'd need to override:
Reads a line of text. A line is considered to be terminated by any one
of a line feed ('\n'), a carriage return ('\r') delimiter in the result, or a carriage return
followed immediately by a linefeed.
Parameters: ignoreLF If true, the
next '\n' will be skipped
Returns: A String containing the contents of
the line, not including any line-termination characters, or null if
the end of the stream has been reached
Throws: IOException If an I/O
error occurs
See also: LineNumberReader.readLine()
One solution could be to extend from BufferedReader and override the readLine() method (as it was already proposed in other answers).
Take this simplified example only as a PoC.
class MyReader extends BufferedReader {
int size = 8192;
public MyReader(Reader in) {
super(in);
}
public MyReader(Reader in, int sz) {
super(in, sz);
this.size = sz;
}
#Override
public String readLine() throws IOException {
StringBuilder sb = new StringBuilder(this.size);
for (int read = super.read(); read >= 0 && read != '\n'; read = super.read()) {
sb.append((char) read);
}
// in case you want also to preserve the line feed character
// sb.append('\n');
return sb.toString();
}
}
.
public class MyReaderDemo{
public static void main(String[] args) throws FileNotFoundException, IOException {
String text = "abcd\n"
+ "efgh\r\n"
+ "ijkl\r";
ByteArrayInputStream bis = new ByteArrayInputStream(
text.getBytes(StandardCharsets.ISO_8859_1)
);
// BufferedReader in = new BufferedReader(new InputStreamReader(bis));
BufferedReader in = new MyReader(new InputStreamReader(bis));
System.out.println(Arrays.toString(in.readLine().getBytes()));
System.out.println(Arrays.toString(in.readLine().getBytes()));
System.out.println(Arrays.toString(in.readLine().getBytes()));
}
}
output with BufferedReader
[97, 98, 99, 100]
[101, 102, 103, 104]
[105, 106, 107, 108]
output with MyReader
[97, 98, 99, 100]
[101, 102, 103, 104, 13]
[105, 106, 107, 108, 13]
Related
I am using BufferedReader to read a text file line by line. Then i use a method to normalize each line text. But there is something wrong with my normalization method, after the call to it, BufferedReader object stop reading file. Can someone help me with this.
Here is my code:
public static void main(String[] args) {
String string = "";
try (BufferedReader br = new BufferedReader(new FileReader("file.txt"))) {
String line;
while ((line = br.readLine()) != null) {
string += normalize(line);
}
} catch (Exception e) {
}
System.out.println(string);
}
public static String normalize(String string) {
StringBuilder text = new StringBuilder(string.trim());
for(int i = 0; i < text.length(); i++) {
if(text.charAt(i) == ' ') {
removeWhiteSpaces(i + 1, text);
}
}
if(text.charAt(text.length() - 1) != '.') {
text.append('.');
}
text.append("\n");
return text.toString();
}
public static void removeWhiteSpaces(int index, StringBuilder text) {
int j = index;
while(text.charAt(j) == ' ') {
text.deleteCharAt(j);
}
}
and here is the text file that i use:
abc .
asd.
dasd.
I think you have problem in your removeWhiteSpaces(i + 1, text);, and if you have problem in the string process, the reader wont able to read the next line.
You don't check the empty string, and you call text.charAt(text.length()-1), it is a problem too.
Print the exception, change your catch block to write out the exception:
} catch (Exception e) {
e.printStackTrace();
}
The reason is in your while(text.charAt(j) == ' ') {, you don't examine the length of StringBuilder, but you delete it...
Try this:
while ((line = br.readLine()) != null) {
if(line.trim().isEmpty()) {
continue;
}
string += normalize(line);
}
Try ScanReader
Scanner scan = new Scanner(is);
int rowCount = 0;
while (scan.hasNextLine()) {
String temp = scan.nextLine();
if(temp.trim().length()==0){
continue;
}
}
//rest of your logic
The normalize function is causing this.
the following tweak to it shoudl fix this:
public static String normalize(String string) {
if(string.length() < 1) {
return "";
}
StringBuilder text = new StringBuilder(string.trim());
if(text.length() < 1){
return "";
}
for(int i = 0; i < text.length(); i++) {
if(text.charAt(i) == ' ') {
removeWhiteSpaces(i + 1, text);
}
}
if(text.charAt(text.length() - 1) != '.') {
text.append('.');
}
text.append("\n");
return text.toString();
}
The problem is not in your code but in the understanding of the readLine() method. In the documentation is stated:
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html#readLine()
So that means that if the method finds an empty line it will stop reading and return null.
The code proposed by #tijn167 would do the workaround using BufferedReader. If you are not restraint to BufferedReader use ScanReader as #Abhishek Soni suggested.
Also, your method removeWhiteSpaces() is checking for white spaces while the empty lines are not a white space but a carry return \r or a line feed \n or both. So your condition text.charAt(j) == ' ' is never satisfied.
Second line of your file is empty, therefore the while loop stops
I have a lot of massive files I need convert to CSV by replacing certain characters.
I am looking for reliable approach given InputStream return OutputStream and replace all characters c1 to c2.
Trick here is to read and write in parallel, I can't fit whole file in memory.
Do I need to run it in separate thread if I want read and write at the same time?
Thanks a lot for your advices.
To copy data from an input stream to an output stream you write data while you're reading it either a byte (or character) or a line at a time.
Here is an example that reads in a file converting all 'x' characters to 'y'.
BufferedInputStream in = new BufferedInputStream(new FileInputStream("input.dat"));
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream("output.dat"));
int ch;
while((ch = in.read()) != -1) {
if (ch == 'x') ch = 'y';
out.write(ch);
}
out.close();
in.close();
Or if can use a Reader and process a line at a time then can use this aproach:
BufferedReader reader = new BufferedReader(new FileReader("input.dat"));
PrintWriter writer = new PrintWriter(
new BufferedOutputStream(new FileOutputStream("output.dat")));
String str;
while ((str = reader.readLine()) != null) {
str = str.replace('x', 'y'); // replace character at a time
str = str.replace("abc", "ABC"); // replace string sequence
writer.println(str);
}
writer.close();
reader.close();
BufferedInputStream and BufferedReader read ahead and keep 8K of characters in a buffer for performance. Very large files can be processed while only keeping 8K of characters in memory at a time.
FileWriter writer = new FileWriter("Report.csv");
BufferedReader reader = new BufferedReader(new InputStreamReader(YOURSOURCE, Charsets.UTF_8));
String line;
while ((line = reader.readLine()) != null) {
line.replace('c1', 'c2');
writer.append(line);
writer.append('\n');
}
writer.flush();
writer.close();
You can find related answer here: Filter (search and replace) array of bytes in an InputStream
I took #aioobe's answer in that thread, and built the replacing input stream module in Java, which you can find it in my GitHub gist: https://gist.github.com/lhr0909/e6ac2d6dd6752871eb57c4b083799947
Putting the source code here as well:
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Queue;
/**
* Created by simon on 8/29/17.
*/
public class ReplacingInputStream extends FilterInputStream {
private Queue<Integer> inQueue, outQueue;
private final byte[] search, replacement;
public ReplacingInputStream(InputStream in, String search, String replacement) {
super(in);
this.inQueue = new LinkedList<>();
this.outQueue = new LinkedList<>();
this.search = search.getBytes();
this.replacement = replacement.getBytes();
}
private boolean isMatchFound() {
Iterator<Integer> iterator = inQueue.iterator();
for (byte b : search) {
if (!iterator.hasNext() || b != iterator.next()) {
return false;
}
}
return true;
}
private void readAhead() throws IOException {
// Work up some look-ahead.
while (inQueue.size() < search.length) {
int next = super.read();
inQueue.offer(next);
if (next == -1) {
break;
}
}
}
#Override
public int read() throws IOException {
// Next byte already determined.
while (outQueue.isEmpty()) {
readAhead();
if (isMatchFound()) {
for (byte a : search) {
inQueue.remove();
}
for (byte b : replacement) {
outQueue.offer((int) b);
}
} else {
outQueue.add(inQueue.remove());
}
}
return outQueue.remove();
}
#Override
public int read(byte b[]) throws IOException {
return read(b, 0, b.length);
}
// copied straight from InputStream inplementation, just needed to to use `read()` from this class
#Override
public int read(byte b[], int off, int len) throws IOException {
if (b == null) {
throw new NullPointerException();
} else if (off < 0 || len < 0 || len > b.length - off) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return 0;
}
int c = read();
if (c == -1) {
return -1;
}
b[off] = (byte)c;
int i = 1;
try {
for (; i < len ; i++) {
c = read();
if (c == -1) {
break;
}
b[off + i] = (byte)c;
}
} catch (IOException ee) {
}
return i;
}
}
I am working on a program that reads 5 different files containing code that is improperly indented. I have to write a method that properly indents the code and prints it to the console and a new file, given a tab size and the names of the input and output files as parameters. My code so far runs through and indents every line and then tries to determine when to indent another tab or unindent.
public static void justifyJava( String inputFileName, String outputFileName,
int tabSize ) throws FileNotFoundException {
String one_tab = "";
for (int i = 0; i < tabSize; i++) {
one_tab += " ";
}
Scanner input = new Scanner( new File (inputFileName));
PrintStream out = new PrintStream ( new File (outputFileName));
int lineCount = 0;
while ( input.hasNextLine() ) {
String line = input.nextLine();
line = one_tab + line.trim();
lineCount++;
if (lineCount == 1){
line = line.substring(tabSize);
}
else if (lineCount == 2){
Scanner lineScan = new Scanner(line);
while (lineScan.hasNext()) {
String token = lineScan.next();
if (token.length() <= 2) {
line = line.substring(tabSize);
}
}
}
else if (line.contains("{") && lineCount > 2){
System.out.println(line);
out.println(line);
line = one_tab + input.nextLine();
while(!(line.contains("}"))){
line = one_tab + line;
System.out.println(line);
out.println(line);
line = input.nextLine();
}
line = one_tab + line;
}
else if (line.contains("}") && input.hasNextLine()){
line = one_tab + line;
}
else if (!(input.hasNextLine())) {
line = line.substring(tabSize);
}
System.out.println(line);
out.println(line);
}
}
This way is becoming very tedious because of how many situations i have to account for especially since the code in these files use different curly brace styles. Essentially all I'm trying to do is indent every line that follows an opening curly brace by one tab and unindent every line that follows a closing curly brace by one tab. Is there an easier way to do this?
Determining "how many times" you have to indent a line is the same as knowing how many blocks of code opened before this line. To this end, you detect a new block of code if:
The string contains an opening bracket {.
The string contains a control statement, e.g. if.
The second approach is harder, since you have to determine if the string is actually a control statement and not part of a variable name.
Hence, a simple program, that does not cover every possible coding standard, but will work pretty decently works like this:
Search for an opening bracket that does not belong to a comment.
When you find it, recursively call the method passing the new indentation size.
Return after finding the end of the code block.
Here goes a MWE that works for most simple cases. It is able to detect opening and closing brackets outside strings, and does not search inside comment lines.
import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintStream;
import java.util.ArrayList;
import java.util.Scanner;
public class JavaIndent {
public static void main(String[] args) {
try {
JavaIndent.justify("path/to/input.java", "path/to/output.java", 4);
} catch (FileNotFoundException ex) {
System.out.println("File not found...");
}
}
public static void justify(String inputFileName, String outputFileName,
int tabSize) throws FileNotFoundException {
String one_tab = "";
for (int i = 0; i < tabSize; i++) {
one_tab += " ";
}
Scanner input = new Scanner(new File(inputFileName));
PrintStream out = new PrintStream(new File(outputFileName));
JavaIndent.justifyRecursion(one_tab, "", input, out);
}
private static String justifyRecursion(String base_tab, String tab, Scanner input, PrintStream out) {
String line;
boolean flag_open, flag_close, flag_comment, flag_empty;
while (input.hasNextLine()) {
line = input.nextLine().trim();
flag_open = JavaIndent.contains(line, "{");
flag_close = JavaIndent.contains(line, "}");
flag_empty = line.length() == 0;
flag_comment = (flag_empty) ? false : line.charAt(0) == '/';
if (flag_comment || flag_empty) {
out.println(tab + line);
} else if (flag_close) {
return line;
} else if (flag_open) {
out.println(tab + line + "ENTERED OPEN");
line = JavaIndent.justifyRecursion(base_tab, tab + base_tab, input, out);
out.println(tab + line);
// Handles statements like } else { and sequences of these.
flag_open = JavaIndent.contains(line, "{");
while (flag_open) {
line = JavaIndent.justifyRecursion(base_tab, tab + base_tab, input, out);
out.println(tab + line);
flag_open = JavaIndent.contains(line, "{");
}
} else {
// Just a regular line, nothing special
out.println(tab + line);
}
}
return "";
}
private static boolean contains(String line, String sequence) {
String current = "";
char ch, last_ch = ' ';
int count_quotation = 0;
ArrayList<String> code_without_strings = new ArrayList<>();
for (int k = 0; k < line.length(); ++k) {
ch = line.charAt(k);
if (ch == '"' && count_quotation == 0 && last_ch != '\'') {
code_without_strings.add(current);
current = "";
++count_quotation;
} else if (ch == '"' && count_quotation == 1) {
if (last_ch != '\\') {
count_quotation = 0;
}
}
if (count_quotation == 0) {
current += ch;
}
last_ch = ch;
}
code_without_strings.add(current);
for (String code : code_without_strings) {
if (code.contains(sequence))
return true;
}
return false;
}
}
However, one still needs to consider statements such as this:
if (condition)
System.out.println("This should be indented, but it won't be...");
and this:
/**
* This is just a comment, but the program will indent from here on {.
*/
Try using JavaIndent to indent JavaIndent.java and verify that at the very end you will get
if (code.contains(sequence))
return true;
instead of
if (code.contains(sequence))
return true;
Given the following:
import java.io.*;
public class WriteRead {
public void writeToFile(String filename) throws IOException {
FileWriter fw = new FileWriter(filename);
fw.write("testing");
fw.close();
}
public String readFromFile(String filename) throws IOException {
String str = "";
int characterInt = 0;
FileReader fr = new FileReader(filename);
while (characterInt != -1) {
characterInt = fr.read();
str += "" + (char) characterInt;
}
fr.close();
return str;
}
public static void main(String[] args) throws IOException {
WriteRead wR = new WriteRead();
wR.writeToFile("test.java");
System.out.println(wR.readFromFile("test.java"));
}
}
when I run the program it prints the following:
What is the symbol at the end of "testing" and what part of the program causes it to be there?
Use the following code-
while ((characterInt = fr.read()) != -1) {
str += "" + (char) characterInt;
}
since in last iteration when this reads -1 this appends ? in str so please check before adding.and as far as your display of testing0 is concern give me hexcode of this first.
Make the changes as below:
while ((characterInt = fr.read()) != -1) {
//characterInt = fr.read();
str += "" + (char) characterInt;
}
Read the char and compare it with -1 as EOF. You were appending the end of file char in string and then while loop condition was getting failed.
You are setting characterInt = fr.read(); inside your whileloop. You should set this either outside whileloop or in while()condition.
So as per your code, it will run for second iteration also because in second iteration, your characterInt is not equal to -1 (it is holding earlier value i.e. testing).
This question already has answers here:
Read a text file until EOL in Java
(3 answers)
Closed 9 years ago.
I am trying to read a text file which has -
hello James!
How are you today!
I am using the below code:
int readedValue;
do
{
while((readedValue = bufferReader.read()) != 10)
{
//readedValue = bufferReader.read();
char ch = (char) readedValue;
System.out.print(ch);
}
}
while ((readedValue = bufferReader.read()) != -1);
When I read the file now, I get output as hello James!ow are you today!
I am not getting 'H'ow in How. How can I alter this to get the complete text?
You're losing a character in your do-while loop's conditions
do {
...
// ends when the first new line \n character is reached
}
while ((readedValue = bufferReader.read()) != -1);
^ never printed
// the character that isn't read is the first character after the \n, ie. 'H'
Use a single loop that stores the readedValue (readValue) and does any comparison on that one.
I think you need this one...
int readedValue;
while ((readedValue = bufferReader.read()) != -1)
{
if(readedValue != 10)
{
System.out.print((char) readedValue);
}
}
In your example you are reading a character twice when linefeed is encountered ascii 10
do
{
while((readedValue = bufferReader.read()) != 10) // Here
{
//readedValue = bufferReader.read();
char ch = (char) readedValue;
System.out.print(ch);
}
}
while ((readedValue = bufferReader.read()) != -1); // Again here
What you should do is read it only once
while ((readedValue = bufferReader.read()) != -1)
{
if(readedValue != 10)
{
char ch = (char) readedValue;
System.out.print(ch);
}
}
This would help you:
String line;
int readedValue;
String s = "hello James!\n\rHow are you today!";
StringReader input = new StringReader(s);
BufferedReader lineReader= new BufferedReader (input);
while((line=lineReader.readLine())!=null) {
StringReader input2 = new StringReader(line);
BufferedReader charReader= new BufferedReader (input2);
while((readedValue = charReader.read()) != -1) {
char ch = (char) readedValue;
System.out.print(ch);
}
}