Finding CRLF in string

Finding CRLF in string - java

I have a file and I want to read in the string on each line. If the line does not end in CRLF (\r\n), I want to to print something. I made this file by redirecting output from print commands similar to the following.
System.out.println("Test\r\n");
But when I read this line in from the file using buffered reader, it doesn't seem like it catches the CRLF.
I use the following to detect the crlf (where inputline is the line that has been read in).
if(inputline.indexOf("\r\n")<0)
It never detects the \r\n. How can I remedy this? Is this an issue with buffered reader?

readLine
public String readLine()
throws IOException
Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
from http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
Thus you may need to write some of your own code (or take this, borrowed from http://www.coderanch.com/t/276442//java/Reading-file-byte-array)
private byte[] toByteArray(File file) throws FileNotFoundException, IOException{
int length = (int) file.length();
byte[] array = new byte[length];
InputStream in = new FileInputStream(file);
int offset = 0;
while (offset < length) {
offset += in.read(array, offset, (length - offset));
}
in.close();
return array;
}
This will give you all the bytes - nothing stripped. Knock yourself out looking for \r\n...

You can use java.util.Scanner which knows how to find lines in a file (or text)
Scanner sc = new Scanner(new File("filename"));
while(sc.hasNextLine()) {
System.out.println(sc.nextLine());
}

Related

Buffering from a file

I need to read from a file that contain 9000 words, what is the best way to read from this file and what is the difference between bufferingreader aND regular scanner.. or is there other good class to use?
Thanks

If you are doing "efficient" reading, there is no benefit to buffering. If, on the other hand, you are doing "inefficient" reading, then having a buffer will improve performance.
What do I mean by "efficient" reading? Efficient reading means reading bytes off of the InputStream / Reader as fast as they appear. Imagine you wanted to load a whole text file to display in an IDE or other editor. "inefficient" reading is when you are reading information off of the stream piecemeal - ie Scanner.nextDouble() is inefficient reading, as it reads in a few bytes (until the double's digits end), then transforms the number from text to binary. In this case, having a buffer improves performance, as the next call to nextDouble() will read out of the buffer (memory) instead of disk
If you have any questions on this, please ask

Open the file using an input stream. Then read its content to a string using this code:
public static void main(String args[]) throws IOException {
FileInputStream in = null;
in = new FileInputStream("input.txt");
String text = inputStreamToString(is);
}
// Reads an InputStream and converts it to a String.
public String inputStreamToString(InputStream stream) throws IOException {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int length;
while((length = stream.read(buffer)) != -1)
byteArrayOutputStream.write(buffer,0,length);
return byteArrayOutputStream.toString("UTF-8");
}
Check this answer for comparisons between buffered readers:
Read/convert an InputStream to a String
I normally use Scanners when I want to read a file line by line, or based on a delimiter. For example:
try {
Scanner fileScanner = new Scanner(System.in);
File file = new File("file.txt");
fileScanner = new Scanner(file);
while (fileScanner.hasNextLine()) {
String line = fileScanner.nextLine();
System.out.println(line);
}
fileScanner.close();
} catch (Exception ex) {
ex.printStackTrace();
}
To scan based on a delimiter, you can use something similar to this:
fileScanner.userDelimiter("\\s"); // \s matches any whitespace
while(fileScanner.hasNext()){
//do something with scanned data
String word = fileScanner.next();
//Double num = fileScanner.nextDouble();
}

How to use AsynchronousFileChannel to read to a StringBuffer efficiently

So you know you can use AsynchronousFileChannel to read an entire file to a String:
AsynchronousFileChannel fileChannel = AsynchronousFileChannel.open(filePath, StandardOpenOption.READ);
long len = fileChannel.size();
ReadAttachment readAttachment = new ReadAttachment();
readAttachment.byteBuffer = ByteBuffer.allocate((int) len);
readAttachment.asynchronousChannel = fileChannel;
CompletionHandler<Integer, ReadAttachment> completionHandler = new CompletionHandler<Integer, ReadAttachment>() {
#Override
public void completed(Integer result, ReadAttachment attachment) {
String content = new String(attachment.byteBuffer.array());
try {
attachment.asynchronousChannel.close();
} catch (IOException e) {
e.printStackTrace();
}
completeCallback.accept(content);
}
#Override
public void failed(Throwable exc, ReadAttachment attachment) {
exc.printStackTrace();
exceptionError(errorCallback, completeCallback, String.format("error while reading file [%s]: %s", path, exc.getMessage()));
}
};
fileChannel.read(
readAttachment.byteBuffer,
0,
readAttachment,
completionHandler);
Suppose that now, I don't want to allocate an entire ByteBuffer, but read line by line. I could use a ByteBuffer of fixed width and keep recalling read many times, always copying and appending to a StringBuffer until I don't get to a new line... My only concern is: because the encoding of the file that I am reading could be multi byte per character (UTF something), it may happen that the read bytes end with an uncomplete character. How can I make sure that I'm converting the right bytes into strings and not messing up the encoding?
UPDATE: answer is in the comment of the selected answer, but it basically points to CharsetDecoder.

If you have clear ASCII separator which you have in your case (\n), you'll not need to care about incomplete string as this character maps to singlebyte (and vice versa).
So just search for '\n' byte in your input and read and convert anything before into String. Loop until no more new lines are found. Then compact the buffer and reuse it for next read. If you don't find new line you'll have to allocate bigger buffer, copy the content of the old one and only then call the read again.
EDIT: As mentioned in the comment, you can pass the ByteBuffer to CharsetDecoder on the fly and translate it into CharBuffer (then append to StringBuilder or whatever is preffered solution).

Try Scanner:
Scanner sc = new Scanner(FileChannel.open(filePath, StandardOpenOption.READ));
String line = sc.readLine();
FileChannel is InterruptibleChannel

how to take 10000 character string input in java

I need to take a 10000 character string as input from user in a program in java. But when i use the normal way it gives NZEC error in ideone and spoj. How can i take such a string as an input ?
import java.io.*;
class st
{
public static void main(String args[])throws IOException
{
String a;
BufferedReader g=new BufferedReader(new InputStreamReader(System.in));
a=g.readLine();
}
}

BufferedReader uses a buffer that is large enough "for most purposes". 10000 characters is probably too large. Since you're using readLine, the reader is scanning characters read, looking for an end of line. After its internal buffer is full, and it still hasn't found an end of line, it throws an exception.
You could try setting the size of the buffer when you create the BufferedReader:
BufferedReader g=new BufferedReader(new InputStreamReader(System.in), 10002);
Or you could use use
BufferedReader.read(char[] cbuf, int off, int len)
instead. That would give you an array of char, and you'd need to convert it back to a String.

Just read until the buffer is full.
byte[] buffer = new byte[10000];
DataInputStream dis = new DataInputStream(System.in);
dis.readFully(buffer);
// Once you get here, the buffer is filled with the input of stdin.
String str = new String(buffer);

Take a look at Runtime Error (NZEC) in simple code to understand possible reasons for the error message.
I suggest you wrap the readLine() in a try/catch block and print the error message / stack trace.

Newline character different than system newline

My Question: How do I force an input stream to process line separators as the system standard line separator?
I read a file to a string and the newlines get converted to \n but my System.getProperty("line.separator"); is \r\n. I want this to be portable, so I want my file reader to read the newlines as the system standard newline character (whatever that may be). How can I force it? Here are my methods from the Java Helper Library to read the file in as a string.
/**
* Takes the file and returns it in a string. Uses UTF-8 encoding
*
* #param fileLocation
* #return the file in String form
* #throws IOException when trying to read from the file
*/
public static String fileToString(String fileLocation) throws IOException {
InputStreamReader streamReader = new InputStreamReader(new FileInputStream(fileLocation), "UTF-8");
return readerToString(streamReader);
}
/**
* Returns all the lines in the Reader's stream as a String
*
* #param reader
* #return
* #throws IOException when trying to read from the file
*/
public static String readerToString(Reader reader) throws IOException {
StringWriter stringWriter = new StringWriter();
char[] buffer = new char[1024];
int length;
while ((length = reader.read(buffer)) > 0) {
stringWriter.write(buffer, 0, length);
}
reader.close();
stringWriter.close();
return stringWriter.toString();
}

Your readerToString method doesn't do anything to line endings. It simply copies character data - that's all. It's entirely unclear how you're diagnosing the problem, but that code really doesn't change \n to \r\n. It must be \r\n in the file - which you should look at in a hex editor. What created the file in the first place? You should look there for how any line breaks are represented.
If you want to read lines, use BufferedReader.readLine() which will cope with \r, \n or \r\n.
Note that Guava has a lot of helpful methods for reading all the data from readers, as well as splitting a reader into lines etc.

It's advisable to use BufferedReader for reading a file line-by-line in a portable way, and then you can use each of the lines read for writing to the required output using the line separator of your choice

With the Scanner#useDelimiter method you can specify what delimiter to use when reading from a File or InputStream or whatever.

You can use a BufferedReader to read the file line by line and convert the line separators, e.g.:
public static String readerToString(Reader reader) throws IOException {
BufferedReader bufReader = new BufferedReader(reader);
StringBuffer stringBuf = new StringBuffer();
String separator = System.getProperty("line.separator");
String line = null;
while ((line = bufReader.readLine()) != null) {
stringBuf.append(line).append(separator);
}
bufReader.close();
return stringBuf.toString();
}

Conflicting character counts

I'm trying to find the number of characters in a given text file.
I've tried using both a scanner and a BufferedReader, but I get conflicting results. With the use of a scanner I concatenate every line after I append a new line character. E.g. like this:
FileReader reader = new FileReader("sampleFile.txt");
Scanner lineScanner = new Scanner(reader);
String totalLines = "";
while (lineScanner.hasNextLine()){
String line = lineScanner.nextLine()+'\n';
totalLines += line;
}
System.out.println("Count "+totalLines.length());
This returns the true character count for my file, which is 5799
Whereas when I use:
BufferedReader reader = new BufferedReader(new FileReader("sample.txt"));
int i;
int count = 0;
while ((i = in.read()) != -1) {
count++;
}
System.out.println("Count "+count);
I get 5892.
I know using the lineScanner will be off by one if there is only one line, but for my text file I get the correct ouput.
Also in notepad++ the file length in bytes is 5892 but the character count without blanks is 5706.

Your file may have lines terminated with \r\n rather than \n. That could cause your discrepancy.

You have to consider the newline/carriage returns character in a text file. This also counts as a character.
I would suggest using the BufferedReader as it will return more accurate results.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding CRLF in string - java

You can use java.util.Scanner which knows how to find lines in a file (or text) Scanner sc = new Scanner(new File("filename")); while(sc.hasNextLine()) { System.out.println(sc.nextLine()); }

Related

Buffering from a file

How to use AsynchronousFileChannel to read to a StringBuffer efficiently

how to take 10000 character string input in java

Newline character different than system newline

Conflicting character counts

Categories

Resources