Skip a line using BufferedReader (skip, but not read it) - java

Hi guys I am currently using BufferedReader to read files. I have something like:
br.readLine() != null
for each call loop.
And now what should I do if I want to skip a line. Here, I've read several similar questions other people posted, most of them suggested using readLine().
I know calling readLine() once will cause the pointer to the next line. But this is not preferred as I am considering the reading performance here. Although you seem to skip a line, the system actually read it already, so it is not time-efficiency. What I want is to move the pointer to the next line, without reading it.
Any good idea?

It's not possible to skip the line without reading it.
In order to know where to skip to, you have to know where the next new line character is, so you have to read it.
P.S. Unless you have a good reason not to, BufferedReader should be fine for you - it's quite efficient

You'll sometimes have a problem skipping the first line with code analysis tools (sonar etc), complaining that you didn't use the value. In the case of CSV, you may want to skip (i.e. not use) the first row if its headers. In which case, you could stream the lines and skip perhaps?
try (BufferedReader reader = new BufferedReader(new FileReader(filename))) {
Stream<String> lines = reader.lines().skip(1);
lines.forEachOrdered(line -> {
...
});
}

If you care about the memory wasted in the intermediate StringBuffer, you can try the following implementation:
public static void skipLine(BufferedReader br) throws IOException {
while(true) {
int c = br.read();
if(c == -1 || c == '\n')
return;
if(c == '\r') {
br.mark(1);
c = br.read();
if(c != '\n')
br.reset();
return;
}
}
}
Seems that it works nicely for all the EOLs supported by readLine ('\n', '\r', '\r\n').
As an alternative you may extend the BufferedReader class adding this method as instance method inside it.

In general, it's not possible in any language and with any API.
To skip a line you need to know the next line's offset. Either you have a file format that provides you that information, you are given line offsets as input, or you need to read every single byte just to know when a line ends and the next begins.

Have you read the code of readLine()? It reads chars one by one until it finds an \r or a \n and appends them to a StringBuffer. What you want is this behaviour without the burden of creating a StringBuffer. Just override the class BufferedReader and provide your own implementation which juste reads chars until /r or /n without building the useless String.
The only problem I see is that many fields are private...

Related

Reading all content of a Java BufferedReader including the line termination characters

I'm writing a TCP client that receives some binary data and sends it to a device. The problem arises when I use BufferedReader to read what it has received.
I'm extremely puzzled by finding out that there is no method available to read all the data. The readLine() method that everybody is using, detects both \n and \r characters as line termination characters, so I can't get the data and concat the lines, because I don't know which char was the line terminator. I also can't use read(buf, offset, num), because it doesn't return the number of bytes it has read. If I read it byte by byte using read() method, it would become terribly slow. Please someone tell me what is the solution, this API seems quite stupid to me!
Well, first of all thanks to everyone. I think the main problem was because I had read tutorialspoint instead of Java documentation. But pardon me for it, as I live in Iran, and Oracle doesn't let us access the documentation for whatever reason it is. Thanks anyway for the patient and helpful responses.
This is more than likely an XY problem.
The beginning of your question reads:
I'm writing a TCP client that receives some binary data and sends it to a device. The problem arises when I use BufferedReader to read what it has received.
This is binary data; do not use a Reader to start with! A Reader wraps an InputStream using a Charset and yields a stream of chars, not bytes. See, among other sources, here for more details.
Next:
I'm extremely puzzled by finding out that there is no method available to read all the data
With reason. There is no telling how large the data may be, and as a result such a method would be fraught with problems if the data you receive is too large.
So, now that using a Reader is out of the way, what you really need to do is this:
read some binary data from a Socket;
copy this data to another source.
The solutions to do that are many; here is one solution which requires nothing but the standard JDK (7+):
final byte[] buf = new byte[8192]; // or other
try (
final InputStream in = theSocket.getInputStream();
final OutputStream out = whatever();
) {
int nrBytes;
while ((nrBytes = in.read(buf)) != -1)
out.write(buf, 0, nrBytes);
}
Wrap this code in a method or whatever etc.
I'm extremely puzzled by finding out that there is no method available to read all the data.
There are three.
The readLine() method that everybody is using, detects both \n and \r characters as line termination characters, so I can't get the data and concat the lines, because I don't know which char was the line terminator.
Correct. It is documented to suppress the line terminator.
I also can't use read(buf, offset, num), because it doesn't return the number of bytes it has read.
It returns the number of chars read.
If I read it byte by byte using read() method, it would become terribly slow.
That reads it char by char, not byte by byte, but you're wrong about the performance. It's buffered.
Please someone tell me what is the solution
You shouldn't be using a Reader for binary data in the first place. I can only suggest you re-read the Javadoc for:
BufferedInputStream.read() throws IOException;
BufferedInputStream.read(byte[]) throws IOException;
BufferedInputStream.read(byte[], int, int) throws IOException;
The last two both return the number of bytes read, or -1 at end of stream.
this API seems quite stupid to me!
No comment.
In the first place everyone who reads data has to plan for \n, \r, \r\n as possible sequences except when parsing HTTP headers which must be separated with \r\n. You could easily read line by line and output whatever line separator you like.
Secondly the read method returns the number of characters it has read into a char[] so that works exactly correctly if you want to read a chunk of chars and do your own line parsing and outputting.
The best thing I can recommended is that you use BufferedReader.read() and iterate over every character in the file. Something like this:
String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
String l = "";
Char c = " ";
while (true){
c = br.read();
if not c == "\n"{
// do stuff, not sure what you want with the endl encoding
// break to return endl-free line
}
if not c == "\r"{
// do stuff, not sure what you want with the endl encoding
// break to return endl-free line
Char ctwo = ' '
ctwo = br.read();
if ctwo == "\n"{
// do extra stuff since you know that you've got a \r\n
}
}
else{
l = l + c;
}
if (l == null) break;
...
l = "";
}
previously answered by #https://stackoverflow.com/users/615234/arrdem

Java File Replace Lines

I have a 250 GB big .txt file and i have just 50 GB space left on my harddrive.
Every line in this .txt file has a long prefix and i want to delete this prefix
to make that file smaller.
First i wanted to read line by line, change it and write it into another file.
// read line out of first file
line = line.replace(prefix, "");
// write line into second file
The Problem is i have not enough space for that.
So how can i delete all prefixes out out of my file?
Check RandomAccessFile: http://docs.oracle.com/javase/7/docs/api/java/io/RandomAccessFile.html
You have to keep track of the position you are reading from and the position you are writing to. Initially both are at the start. Then you read N bytes (one line), shorten it, seek back N bytes and write M bytes (the shortened line). Then you seek forward (N - M) bytes to get back to the position where next line starts. Then you do this over and over again. In the end truncate excess with setLength(long).
You can also do it in batches (like read 4kb, process, write, repeat) to make it more efficient.
The process is identical in all languages. Some make it easier by hiding the seeking back and forth behind an API.
Of course you have to be absolutely sure that your program works flawlessly, since there is no way to undo this process.
Also, the RandomAccessFile is a bit limited, since it can not tell you at which position the file is at a given moment. Therefore you have to do conversion between "decoded strings" and "encoded bytes" as you go. If your file is in UTF-8, a given character in the string can take one ore many bytes in the file. So you can't just do seek(string.length()). You have to use seek(string.getBytes(encoding).length) and factor in possible line break conversions (Windows uses two characters for line break, Unix uses only one). But if you have ASCII, ISO-Latin-1 or similar trivial character encoding and know what line break chars the file has, then the problem should be pretty simple.
And as I edit my answer to match all possible corner cases, I think it would be better to read the file using BufferedReader and correct character encoding and also open a RandomAccessFile for doing the writing. If your OS supports having a file being opened twice. This way you would get complete Unicode support from BufferedReader and yuou wouldn't have to keep track of read and write positions. You have to do the writing with RandomAccessFile because using a Writer to the file may just truncate it (haven't tried it, though).
Something like this. It works on trivial examples but it has no error checking and I absolutely give no guarantees. Test it on a smaller file first.
public static void main(String[] args) throws IOException {
File f = new File(args[0]);
BufferedReader reader = new BufferedReader(new InputStreamReader(
new FileInputStream(f), "UTF-8")); // Use correct encoding here.
RandomAccessFile writer = new RandomAccessFile(f, "rw");
String line = null;
long totalWritten = 0;
while ((line = reader.readLine()) != null) {
line = line.trim() + "\n"; // Remove your prefix here.
byte[] b = line.getBytes("UTF-8");
writer.write(b);
totalWritten += b.length;
}
reader.close();
writer.setLength(totalWritten);
writer.close();
}
You can use RandomAccessFile. That allows you to overwrite parts of the file. And since there is no copy- or caching-mechanism mentioned in the javadoc this should work without additional disk-space.
So you could overwrite the unwanted parts with spaces.
Split the 250 GB file into 5 files of 50 GB each. Then process each file and then delete it. This way you will always have 50 GB left on your machine and you will also be able to process 250 GB file.
Since it does not have to be done in Java, i would recommend Python for this:
Save the following in replace.py in the same folder with your textfile:
import fileinput
for line in fileinput.input("your-file.txt", inplace=True):
print "%s" % (line.replace("oldstring", "newstring"))
replace the two strings with your string and execute python replace.py

Java Substring match is failing

I am using a text file to read values from and loading this file into a buffered reader. thereafter i am reading the file line by line and checking if any of line contains one of my keywords (i already have them in a list of String).
However, even though the line contains the keyword i am looking for it does not detects it and gives it a Miss. Here is the code
for(int i=0;i<sortedKeywordList.size();i++)
{
String tempString=sortedKeywordList.get(i);
while(US.readLine()!=null)
{
String str=US.readLine();
//System.out.println(str);
if(str.contains(tempString)){
System.out.println("Contains: "+tempString);
}
else{
System.out.println("Miss");
}
}
}
For each keyword, you're iterating through your buffer using readLine(). So after your first keyword, you'll have exhausted your buffer reading and the next keyword test won't even execute since US.readLine() is giving you null. You're not re-initialising your reader.
So why not iterate through your file once (using your readLine() structure), and then for each line iterate through your keywords ?
EDIT: As Hunter as pointed out (above) you're also calling readLine() twice per loop. Once in your loop test and once to check each line for a keyword. I would first of all ensure you're reading the file correctly (simply by printing out each line as you read it)
You're calling US.readLine() twice!
Try instead:
String tempString;
String str;
for(...)
{
tempString = sortedKeywordList.get(i);
while((str = US.readLine()) != null)
{
...
}
}
You are calling US.readLine() once in the while loop entrance and again inside. This moves the input to the next line. Also, compare strings with .equals() and to check for substrings using .contains()
I'm seeing two major problems.
You've got your loops backwards.
The way you've written it, it looks at keyword1, and then looks through the whole input, checking for keyword1. Now, there's no more input, and it moves to keyword2, but there's no input left for it to check, so it quickly iterates through the rest of your keywords and quits.
You want to loop through the input, checking for each keyword, not through the keywords, checking each line of input.
while(input){
for each keyword {
...
You're calling .readLine() twice for each iteration, effectively skipping every other line.
Try storing the first line outside of the loop, checking for null in your loop condition, and then calling readLine juuust before the end of your loop.
The dataset in question would be helpful. Without it, a couple thoughts -
Verify the case of the sorted keywords matches case from the text file. If they are mismatched and you need to support case-insensitive matching, convert both strings to the same case (e.g., use toUpperCase()) then use the contains() call.
Verify no extra characters (like linefeeds/etc) appended the end of the sorted keyword.

Java scanner to detect blank line?

I've been having lots of trouble trying to get either a scanner or a buffered reader to try and detect a blank line. For example if I have a file that contains:
there
cat
dog
(BLANK LINE)
If I do this:
while( scan.hasNextLine() )
{
String line = scan.nextLine();
...
...
}
The scanner doesn't pick up the blank line. I tried to use a buffered reader also but I run into this issue. Is there some way the scanner can just return a "" whenever it finds a blank line like that? Cheers
Your input has as many lines as it has \n characters. Given the input
"there\ncat\ndog\n"
the next-lines will be correctly divided as
"there\n"
"cat\n"
"dog\n"
(In other words, there is no fourth blank line, since it is not terminated by a \n.)
Put differently, after the "dog\n" has been read, the scanner (or buffered reader for that matter) has reached EOF and there's not even an empty line to return. (Note that when the lines are returned, the new-line character is stripped off.)
So, since this is the expected behavior, I don't know what the easiest fix is. I suspect that the best way to solve this is simply to append a \n to the input, so that the loop runs an extra iteration.

BufferedReader problem in Java

me and my buddy are working on a program for our Object Oriented Programming course at college. We are trying to write text into a file as a database for information. The problem is that when we try to read the corresponding lines with BufferedReader we can't seem to figure out how to read the correct lines. The only functions avaliable seem to be read(), which reads a character only. readLine() reads only a line(not the line we want it to read. skip() only skips a number of characters specified. Anyone have an idea how we could tell the program what line we want to read? Our method getAnswer() with the argument int rowNumber is the one we are trying to do:
Superclass: http://pastebin.com/d2d9ac07f
Subclass is irrelevant(mostly because we haven't written it yet).
Of course it is Java we are working with.
Thanks beforehand.
You will have to use readLine(), do this in a loop, count the number of lines you've already read until you've reached the line number that you want to process.
There is no method in BufferedReader or other standard library class that will read line number N for you automatically.
Use the Buffered Readers .readLine(); method until you get to the data you need. Throw away everything you don't and then store the data you do need. Granted this isn't effiecent it should get your job done.
readLine() in Java simply reads from the buffer until it comes upon a newline character, so there would really be no way for you to specify which line should be read from a file because there is no way for Java to know exactly how long each line is.
This reason is also why it's difficult to use skip() to jump to a particular line.
It might be better for you to loop through lines using readLine(), then when your counter is where you'd like it to be, begin processing.
String line = myBufferedReader.readLine();
for(int i = 1; i < whichLine && line != null; i++){
line = myBufferedReader.readLine();
}
/* do something */

Categories