Read last n lines in reverse order - java

I am trying to read last n lines of a file in reverse order. Is this the most efficient way to do it? My file is not big but it could eventually grow to several GB. Also, I am trying to read last 10 lines but this one only returns last 9. Anything I am missing?
// Read n lines from the end of the file
public void readFromLast(File file, int lines) {
int readLines = 0;
StringBuilder builder = new StringBuilder();
RandomAccessFile randomAccessFile = null;
try {
randomAccessFile = new RandomAccessFile(file, "r");
long fileLength = file.length() - 1;
// Set the pointer at the last of the file
randomAccessFile.seek(fileLength);
for (long pointer = fileLength; pointer >= 0; pointer--) {
randomAccessFile.seek(pointer);
char c = (char) randomAccessFile.read();
builder.append(c);
if(c == '\n'){
builder = builder.reverse();
System.out.print(builder.toString());
readLines++;
builder = null;
builder = new StringBuilder();
if (readLines == lines + 1){
break;
}
}
}
} catch (FileNotFoundException e) {
log.info("FileNotFound " +e.getMessage()+ "occured while reading last n lines");
e.printStackTrace();
} catch (IOException e) {
log.info("IOException" + e.getMessage() +" occured while reading last n lines");
} finally {
if (randomAccessFile != null) {
try {
randomAccessFile.close();
} catch (IOException e) {
log.info("IOException" + e.getMessage() +" occured while closing the file reading last n lines");
}
}
}
}

Your code is fine. I am pretty sure it is reading 1 more line than it should, not 1 less. You are probably reading file that does not have enough lines?
If you want to correct it remove +1 from if (readLines == lines + 1){ and it will be fine.
Also a tip instead of setting StringBuilder to null and creating it again you can use
bulder.setLength(0);
it is a bit cleaner

Related

Java - Buffer - My Code skips the last Line of a TextFile when reading

im Trying to read a TextFile of Numbers on each Line into an ArrayList.
When i execute the following Function, it always Skips the last Element.
can somebody help me out ? cause i dont get the Problem here, since it reads til the Buffer is empty, he should stop when the FileEnd is reached, correct ?
List<Double> lines = new ArrayList<>();
ByteBuffer buffer = ByteBuffer.allocateDirect(1024);
StringBuilder oneLine = new StringBuilder();
try (SeekableByteChannel byteChannel = Files.newByteChannel(Paths.get(fileName))) {
while (byteChannel.read(buffer) > 0) {
buffer.flip();
for (int i = 0; i < buffer.limit(); i++) {
char c = (char) buffer.get();
if (c == '\r') {
//Skip it
}
if (c == '\n') {
System.out.println(oneLine.toString()); //Test Output to see what he got
lines.add(Double.parseDouble(oneLine.toString().replace(',', '.')));
oneLine.setLength(0);
}
else {
if (c != '\r') {
oneLine.append(c);
}
}
}
buffer.clear();
}
System.out.println("Anzahl zeilen: " + (lines.size()));
System.out.println("Finished");
}
catch (IOException e) {
System.out.println("The File that was defined could not be found");
e.printStackTrace();
}
return lines;
}
the TextFile with one Number on each Line:
999973
22423
999974
999975
999976
999977
573643
999978
999979
999980
999981
34322
999982
999983
999984
999985
999986
999987
999988
3
67
84,000
7896575543
8.0
100001
9999991
8.0
When you reach the end of the file and there is no line break at your end of the file, you will add the characters to the StringBuilder but never append it to the list. You can either add a newline to the end of your text file or call lines.add(...) just before you call buffer.clear().

Quickly read in large amount of data

I am looking for a quick way to read in the roughly 150mb worth of spectroscopic data I have into a program I am writing. The data is currently stored in a text file (.dat) and its content is stored in a format like:
489.99992 490.000000.011780.01409
where the first N values represent x values and are separated by spaces and the last N values are y values separated by newline characters. (eg. x1= 489.99992, x2= 490.00000, y1=0.01178, y2=0.01409).
I wrote the following parser,
private void parse()
{
FileReader reader = null;
String currentNumber = "";
int indexOfIntensity = 0;
long startTime = System.currentTimeMillis();
try
{
reader = new FileReader(FILE);
char[] chars = new char[65536];
boolean waveNumMode = true;
double valueAsDouble;
//get buffer sized chunks of data from the file
for(int len; (len = reader.read(chars)) > 0;)
{
//parse through the buffer
for(int i = 0; i < len; i++)
{
//is a new number if true
if((chars[i] == ' ' || chars[i] == '\n') && currentNumber != "")
{
try
{
valueAsDouble = Double.parseDouble(currentNumber);
}catch(NumberFormatException nfe)
{
System.out.println("Could not convert to double: " + currentNumber);
currentNumber = "";
continue;
}
if(waveNumMode)
{
//System.out.println("Wavenumber: " + valueAsDouble);
listOfPoints.add(new Tuple(valueAsDouble));
}else
{
//System.out.println("Intensity: " + valueAsDouble);
listOfPoints.get(indexOfIntensity).setIntensityValue(valueAsDouble);
indexOfIntensity++;
}
if(chars[i] == '\n')
{
waveNumMode = false;
}
currentNumber = ""; //clear for the next number
continue;
}
currentNumber += chars[i];
}
}
} catch (IOException e) {
e.printStackTrace();
}
try
{
reader.close();
} catch (IOException e)
{
e.printStackTrace();
}
long stopTime = System.currentTimeMillis();
System.out.println("Execution time: " + ((stopTime - startTime) / 1000.0) + " seconds");
}
but this takes around 50 seconds to finish for the 150mb file. For reference, we are using another piece of software which does this in roughly half a second (however it uses its own custom file type). I am willing to use a different file type or whatever really if it brings the execution time down. How can I speed this up?
Thanks in advance
In order to optimize code, you first need to find what parts of the code are slowing things down. Use a profiler to measure your code's performance and identify what parts are slowing down the process.
try reading all bytes from the file at once and then parse:
Files.readAllBytes(Paths.get(fileName))
as reader.read() operation is costly in Java.
You can also try surrounding your FileReader with BufferReader and then check if any performance gain.
For more info, visit the link:
https://www.geeksforgeeks.org/different-ways-reading-text-file-java/

finding the number of occurrences for a specific char using recursion

This code below is part of a program, that will find the number of occurrences of the input character in a text file
public static void main(String[] args){
[...]
java.io.File file1=new java.io.File(dirPath1);
FileInputStream fis = new FileInputStream(file1);
System.out.println(" return "+rec(sc.next().charAt(0),fis));
}
public static int rec (char ch, FileInputStream fis)throws IOException{
char current=0;
if(fis.available()==0){
return 0;
}
if(fis.read()!=-1){
current = (char) fis.read();
}
if(current==ch) {
return 1+rec(ch,fis);
}else
return rec(ch,fis);
}
}
The problem is:
If the file has one character, and ch=that one character. it returns 0, when I traced the code I found that it doesn't enter if(current==ch). Although, they are the same char.
if there is more than on character,strings some of them-the matches chars- will enter if block and others won't.
How can I fix this ?
Is there another way to find the number of occurrences recursively ?
another question: should I use try and catch in rec method to catch IOException ?
Thanks in advance
P.S. this program is from assignment,I have to use recursion and compare it with iteration.
you call fis.read() twice first one read first character and second one read nothing
this is your answer
public static int rec(char ch, FileInputStream fis) throws IOException {
char current = 0;
if (fis.available() == 0) {
return 0;
}
int read = fis.read();
if (read != -1) {
current = (char) read;
}
if (current == ch) {
return 1 + rec(ch, fis);
}
else
return rec(ch, fis);
}
My suggestion would be as follows:
Read the whole text file into a java.lang.String
Then use the library Apache Commons Lang and use this method for counting the occurrences:
http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#countMatches-java.lang.CharSequence-java.lang.CharSequence-
You should use FileReader to read chars from text file.
Reader reader = new FileReader("MyFile.txt");
I think using while ((i=reader.read()) != -1) is a better approach instead of three if and an else.
So you can achieve this with fewer lines of code:
public static int rec (char ch, Reader reader)throws IOException{
char current=0;
int i;
while ((i=reader.read()) != -1) {
current = (char) i;
if(current==ch) {
return 1+rec(ch,reader);
}else
return rec(ch,reader);
}
return 0;
}
I think there is no need to use try and catch in rec method to catch IOException. I have used it here:
try {
Reader reader = new FileReader("MyFile.txt");
System.out.println(" return " + rec('a', reader));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Can reading the dataset be faster in time and/or better in memory than this?

In Java, here is the code to read a file with a table of integers:
public static int[][] getDataset() {
// open data file to read n and m size parameters
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader(filePath));
} catch (FileNotFoundException e) {
e.printStackTrace();
System.exit(1);
}
// count the number of lines
int i = -1;
String line = null, firstLine = null;
do {
// read line
try {
line = br.readLine();
i++;
if (i == 0) firstLine = line;
} catch (IOException e) {
e.printStackTrace();
System.exit(1);
}
} while (line != null);
// close data file
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
System.exit(1);
}
// check the data for emptiness
if (i == 0) {
System.out.println("The dataset is empty!");
System.exit(1);
}
// initialize n and m (at least the first line exists)
n = i; m = firstLine.split(" ").length;
firstLine = null;
// open data file to read the dataset
br = null;
try {
br = new BufferedReader(new FileReader(filePath));
} catch (FileNotFoundException e) {
e.printStackTrace();
System.exit(1);
}
// initialize dataset
int[][] X = new int[n][m];
// process data
i = -1;
while (true) {
// read line
try {
line = br.readLine();
i++;
} catch (IOException e) {
e.printStackTrace();
System.exit(1);
}
// exit point
if (line == null) break;
// convert a line (string of integers) into a dataset row
String[] stringList = line.split(" ");
for (int j = 0; j < m; j++) {
X[i][j] = Integer.parseInt(stringList[j]);
}
}
// close data file
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
System.exit(1);
}
return X;
}
Dataset size parameters n and m are of type static final int and declared outside as well as static final String filePath.
I give you my solution (maybe will be useful for newbies later coming to read this) and ask if it is possible to make it faster in time and/or consuming less memory? I'm interested in perfect micro-optimization, any advice would be great here. In particular I do not like the way the file is opened twice.
Read the file only once and add all lines to an ArraList<String>.
ArrayList grows automatically.
Later process that ArrayList to split the lines.
Further optimisations:
Strimg.split uses a huge regular expression analyzer. Try it with StringTokenizer or your own stringsplit method.
Instead of ArrayList you could avoid overhead by using GrowingIntArray,or GrowingStringArray, these avoid some overhead but are less handy.
speed and mempory usage are contradicting, often you cannot optimize both.
You can save memor by using a one dimesnional array, in java 2d arrays need more space becauseeach column is an object.
access one dim array by X[col + row *rowsize].

Splitting a Byte Array on A Particular Byte

I am trying to read an old .dat file byte by byte, and have run into an issue: a record is terminated by \n (newline). I'd like to read in the whole byte array, then split it on the character.
I can do this by reading the whole byte array from the file, creating a String with the contents of the byte array, then calling String.split(), but find this to be inefficient. I'd rather split the byte array directly if possible.
Can anyone assist?
Update: Code was requested.
public class NgcReader {
public static void main(String[] args) {
String location;
if (System.getProperty("os.name").contains("Windows")) {
location = "F:\\Programming\\Projects\\readngc\\src\\main\\java\\ngcreader\\catalog.dat";
} else {
location = "/media/My Passport/Programming/Projects/readngc/src/main/java/ngcreader/catalog.dat";
}
File file = new File(location);
InputStream is = null;
try {
is = new FileInputStream(file);
} catch (FileNotFoundException e) {
System.out.println("It didn't work!");
System.exit(0);
}
byte[] fileByteArray = new byte[(int) file.length() - 1];
try {
is.read(fileByteArray);
is.close();
} catch (IOException e) {
System.out.println("IOException!");
System.exit(0);
}
// I do NOT like this. I'd rather split the byte array on the \n character
String bigString = new String(fileByteArray);
List<String> stringList = Arrays.asList(bigString.split("\\n"));
for (String record : stringList) {
System.out.print("Catalog number: " + record.substring(1, 6));
System.out.print(" Catalog type: " + record.substring(7, 9));
System.out.print(" Right Ascension: " + record.substring(10, 12) + "h " + record.substring(13, 17) + "min");
System.out.print(" Declination: " + record.substring(18, 21) + " " + record.substring(22, 24));
if (record.length() > 50) {
System.out.print(" Magnitude: " + record.substring(47, 51));
}
if (record.length() > 93) {
System.out.print(" Original Notes: " + record.substring(54,93));
}
if (record.length() > 150) {
System.out.print(" Palomar Notes: " + record.substring(95,150));
}
if (record.length() > 151) {
System.out.print(" Notes: " + record.substring(152));
}
System.out.println();
}
}
Another Update: Here's a README with a description of the file I'm processing:
http://cdsarc.u-strasbg.fr/viz-bin/Cat?VII/1B
It sounds like this might actually just be a text file to start with, in which case:
InputStream stream = new FileInputStream(location);
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(stream,
"ASCII"));
String line;
while ((line = reader.readLine()) != null) {
// Handle the line, ideally in a separate method
}
} finally {
stream.close();
}
This way you never need to have more than a single line of the file in memory at a time.
if you're set on using byte arrays...
byte[] buff = new byte[1024];//smaller buffer
try {
int ind=0,from=0,read;
while((read=is.read(buff,ind,buff.length-ind))!=-1){
for(int i=ind;i<ind+read;i++){
if(buff[i]=='\n'){
string record = new String(buff,from,i+1);
//handle
from=i+1;
}
}
System.arraycopy(buff,from,buff,0,buff.length-from);
ind=ind+read-from;
from=0;
}
} catch (IOException e) {
System.out.println("IOException!");
//System.exit(0);
throw RunTimeException(e);//cleaner way to die
} finally{
is.close();
}
this also avoids loading in the entire file and it puts the close inside a finally

Categories