Read nth line from the end of the file - java

I want to read nth line from the end of the file. However my file size is very huge like 15MB, so I cannot go through each line to find out the last line. Is there an efficient way to get this nth line ?
I went through RandomAccessFile API however my line sizes are not constant so i was not able to move my file pointer to that nth line location from the end. Can some one help me.

You basically have to read the file backwards. The simplest approach, without using "block" reads, is to the get the length of the file, and then use RandomAccessFile to read bytes at (length--) until you have counted the required number of line feeds / carriage returns. You can then read the bytes forward for one line.
Something like this....
RandomAccessFile randomAccessFile = new RandomAccessFile("the.log", "r");
long offset = randomAccessFile.length() - 1;
int found = 0;
while (offset > 0 && found < 10) {
randomAccessFile.seek(offset--);
if (randomAccessFile.read() == 10) {
found++;
}
}
System.out.println(randomAccessFile.readLine());
Single byte reads many not be super efficient. If performance becomes a problem, you take the same approach, but read larger blocks of the file (say 8K) at a time, rather than 1 byte at a time.

Have a look at this answer, which shows that you do need to read through the file (15MB is not big). As long as you are are only storing the latest 9 rows, then you will be able to fly through the file.

Related

handling comp3 and ebcidic conversion in java to ASCII for large files

I am trying to convert comp3 and EBCIDIC characters in my java code but im running into out of memory exception as the amount of data handled is huge about 5 gb. my code is currently as follows:
byte[] data = Files.readAllBytes(path);
this is resulting in an out of memory exception which i can understand, but i cant use a file scanner as well since the data in the file wont be split into lines.
Can anyone point me in the correct direction on how to handle this
Note: the file may contain records of different length hence splitting it based on record length seams not possible.
As Bill said you could (should) ask for the data to be converted to display characters on the mainframe and if English speaking you can do a ascii transfer.
Also how are you deciding where comp-3 fields start ???
You do not have to read the whole file into memory, you can still read the file in blocks, This method will fill an array of bytes:
protected final int readBuffer(InputStream in, final byte[] buf)
throws IOException {
int total = 0;
int num = in.read(buf, total, buf.length);
while (num >= 0 && total + num < buf.length) {
total += num;
num = in.read(buf, total, buf.length - total);
}
return num;
}
if all the records are the same length, create an array of the record length and the above method will read one record at a time.
Finally the JRecord project has classes to read fixed length files etc. It can do comp-3 conversion. Note: I am the author of JRecord.
I'm running into out of memory exception as the amount of data handled is huge about 5 gb.
You only need to read one record at a time.
My code is currently as follows:
byte[] data = Files.readAllBytes(path);
This is resulting in an out of memory exception which i can understand
Me too.
but i cant use a file scanner as well since the data in the file wont be split into lines.
You mean you can't use the Scanner class? That's not the only way to read a record at a time.
In any case not all files have record delimiters. Some have fixed-length records, some have length words at the start of each record, and some have record type attributes at the start of each record, or in both cases at least in the fixed part of the record.
I'll have to split it based on an attribute record_id at a particular position(say at the begining of each record) that will tell me the record length
So read that attribute, decode it if necessary, and read the rest of the record according to the record length you derive from the attribute. One at a time.
I direct your attention to the methods of DataInputStream, especially readFully(). You will also need a Java COMP-3 library. There are several available. Most of the rest can be done by built-in EBCDIC character set decoders.

How to speed up reading of a large OBJ (text) file?

I am using an OBJ Loader library that I found on the 'net and I want to look into speeding it up.
It works by reading an .OBJ file line by line (its basically a text file with lines of numbers)
I have a 12mb OBJ file that equates to approx 400,000 lines. suffice to say, it takes forever to try and read line by line.
Is there a way to speed it up? It uses a BufferedReader to read the file (which is stored in my assets folder)
Here is the link to the library: click me
Just an idea: you could first get the size of the file using the File class, after getting the file:
File file = new File(sdcard,"sample.txt");
long size = file.length();
The size returned is in bytes, thus, divide the file size into a sizable number of chunks e.g. 5, 10, 20 e.t.c. with a byte range specified and saved for each chunk. Create byte arrays of the same size as each chunk created above, then "assign" each chunk to a separate worker thread, which should read its chunk into its corresponding character array using the read(buffer, offset, length) method i.e. read "length" characters of each chunk into the array "buffer" beginning at array index "offset". You have to convert the bytes into characters. Then concatenate all arrays together to get the final complete file contents. Insert checks for the chunk sizes so each thread does not overlap the others boundaries. Again, this is just an idea, hopefully it will work when actually implemented.

Java reading nth line

I am trying to read a specific line from a text file, however I don't want to load the file into memory (it can get really big).
I have been looking but every example i have found requires either to read every line (this would slow my code down as there are over 100,000 lines) or load the whole thing into an array and get the correct element (file will have alot of lines to input).
An example of what I want to do:
String line = File.getLine(5);
"code is not actual code, it is made up to show the principle of what i want"
Is there a way to do this?
-----Edit-----
I have just realized this file will be written too in between reading lines (adding to the end of the file).
Is there a way to do this?
Not unless the lines are of a fixed number of bytes each, no.
You don't have to actually keep each line in memory - but you've got to read through the whole file to get to the line you want, as otherwise you won't know where to start reading.
You have to read the file line by line. Otherwise how do you know when you have gotten to line 5 (as in your example)?
Edit:
You might also want to check out Random Access Files which could be helpful if you know how many bytes per line, as Jon Skeet has said.
The easiest way to do this would be to use a BufferedReader (http://docs.oracle.com/javase/1.5.0/docs/api/java/io/BufferedReader.html), because you can specify your buffer size. You could do something like:
BufferedReader in = new BufferedReader(new FileReader("foo.in"), 1024);
in.readLine();
in.readLine();
in.readLine();
in.readLine();
String line = in.readLine();
1) read a line which the user selects,
If you only need to read a user-selected line once or infrequently (or if the file is small enough), then you just have to read the file line-by-line from the start, until you get to the selected line.
If, on the other hand you need to read a user-selected line frequently, you should build an index of line numbers and offsets. So, for example, line 42 corresponds to an offset of 2347 bytes into the file. Ideally, then, you would only read the entire file once and store the index--for example, in a map, using the line numbers as keys and offsets as values.
2) read new lines added since
the last read. i plan to read the file every 10 seconds. i have got
the line count and can find out the new line numbers but i need to
read that line
For the second point, you can simply save the current offset into the file instead of saving the current line number--but it certainly wouldn't hurt to continue building the index if it will continue to provide a significant performance benefit.
Use RandomAccessFile.seek(long offset) to set the file pointer to the most recently saved offset (confirm the file is longer than the most recently saved offset first--if not, nothing new has been appended).
Use RandomAccessFile.readLine() to read a line of the file
Call RandomAccessFile.getFilePointer() to get the current offset after reading the line and optionally put(currLineNo+1, offset) into the index.
Repeat steps 2-3 until reaching the end of the file.
But don't get too carried away with performance optimizations unless the performance is already a problem or is highly likely to be a problem.
For small files:
String line = Files.readAllLines(Paths.get("file.txt")).get(n);
For large files:
String line;
try (Stream<String> lines = Files.lines(Paths.get("file.txt"))) {
line = lines.skip(n).findFirst().get();
}
Java 7:
String line;
try (BufferedReader br = new BufferedReader(new FileReader("file.txt"))) {
for (int i = 0; i < n; i++)
br.readLine();
line = br.readLine();
}
Source: Reading nth line from file
The only way to do this is to build an index of where each line is (you only need to record the end of each line) Without a way to randomly access a line based on an index from the start, you have to read every byte before that line.
BTW: Reading 100,000 lines might take only one second on a fast machine.
If performance is a big concern here and you are frequently reading random lines from a static file then you can optimize this a bit by reading through the file and building an index (basically just a long[]) of the starting offset of each line of the file.
Once you have this you know exactly where to jump to in the file and then you can read up to the next newline character to retrieve the full line.
Here is a snippet of some code I had which will read a file and write every 10th line including the first line to a new file (writer.) You can always replace the try section with whatever you want to do. To change the number of lines to read just change the 0 in the if statement "lc.endsWith("0")" to whatever line you want to read. But if the file is being written to as you read it, this code will only work with the data that is contained inside the file when you run this code.
LineNumberReader lnr = new LineNumberReader(new FileReader(new File(file)));
lnr.skip(Long.MAX_VALUE);
int linecount=lnr.getLineNumber();
lnr.close();
for (int i=0; i<=linecount; i++){
//read lines
String line = bufferedReader.readLine();
String lc = String.valueOf(i);
if (lc.endsWith("0")){
try{
writer.append(line+"\n");
writer.flush();
}catch(Exception ee){
}
}
}

java reading from a file from a certain point to another

This is my first question here, I hope you kind sirs can help me and I thank you in advance.
I am trying to write a Java project using threads and the replicated workers paradigm. What I want to do is create a workpool of tasks. The tasks that the workers have to do is simply count the number of words in a specified file between two indices. I want to create a task like this: (file,startIndex,finishIndex). I have problems finding out what file handling class I should use to open a file and read the words from startIndex to finishIndex. I should also mention that I am given a chunk size and I am supposed to split the tasks using that. ChunkSize is an int representing the number of bytes
Bottom line: I want to read from a file from startIndex to startIndex + chunkSize.
I think you are looking for the RandomAccessFile class. It has a "seek" method that allows you to skip to a certain position in the file. Example:
int chunkSize = 64;
long startingIndex = 55;
byte[] bytesRead = new byte[chunkSize];
RandomAccessFile file = new RandomAccessFile("file.txt", "r");
file.seek(startingIndex);
file.read(bytesRead);
file.close();
Note that this will seek a number of bytes, not words. It's impossible to know where are the words in a file before reading it. Counting the spaces and adding one is a naive method that would work well in this case.

How to read a file in java based on file number? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Read a specific line from a text file
Is there any way to read a line in a file in java.I mean if i want to read 100th line only then can i read it directly? Or I have to read the whole file until it comes to line 100.
You can use java.io.RandomAccessFile.
Moving to 100th line use the following lines:
RandomAccessFile file = new RandomAccessFile("D:\\test.txt", "rw");
int totalLines = (int)file.length();
file.seek(100);
long pointer = file.getFilePointer();
for(int pt = 100; ct < totalLines; ct++){
byte b = file.readByte(); //read byte from the file
System.out.print((char)b); //convert byte into char
}
file.close();
For more details please see the below link which will helps you:
http://tutorials.jenkov.com/java-io/randomaccessfile.html
No, no matter the abstractions, there really isn't an efficient way of "directly" reading the 100th line from a file-system. You can of course use offsets in case you have lines with fixed lengths per line (assuming CR or LF etc.) but that's it. You can't jump around in a file based on the "line" abstraction.
Under most circumstances you will need to read line-by-line starting from the start of the file.
There are exceptions to this:
If you create and maintain an index for the file that indicates the positions of each line start, you can lookup a line in the index and then seek the file to the position to read it.
If your file consists of fixed length lines, you can calculate the position of the start of a line as line_no * line_length, and then seek to that position.

Categories