I'm reading a file line by line, and I am trying to make it so that if I get to a line that fits my specific parameters (in my case if it begins with a certain word), that I can overwrite that line.
My current code:
try {
FileInputStream fis = new FileInputStream(myFile);
DataInputStream in = new DataInputStream(fis);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
if (line.startsWith("word")) {
// replace line code here
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
...where myFile is a File object.
As always, any help, examples, or suggestions are much appreciated.
Thanks!
RandomAccessFile seems a good fit. Its javadoc says:
Instances of this class support both reading and writing to a random access file. A random access file behaves like a large array of bytes stored in the file system. There is a kind of cursor, or index into the implied array, called the file pointer; input operations read bytes starting at the file pointer and advance the file pointer past the bytes read. If the random access file is created in read/write mode, then output operations are also available; output operations write bytes starting at the file pointer and advance the file pointer past the bytes written. Output operations that write past the current end of the implied array cause the array to be extended. The file pointer can be read by the getFilePointer method and set by the seek method.
That said, since text files are a sequential file format, you can not replace a line with a line of a different length without moving all subsequent characters around, so to replace lines will in general amount to reading and writing the entire file. This may be easier to accomplish if you write to a separate file, and rename the output file once you are done. This is also more robust in case if something goes wrong, as one can simply retry with the contents of the initial file. The only advantage of RandomAccessFile is that you do not need the disk space for the temporary output file, and may get slight better performance out of the disk due to better access locality.
Your best bet here is likely going to be reading in the file into memory (Something like a StringBuilder) and writing what you want your output file to look like into the StringBuilder. After you're done reading in the file completely, you'll then want to write the contents of the StringBuilder to the file.
If the file is too large to accomplish this in memory you can always read in the contents of the file line by line and write them to a temporary file instead of a StringBuilder. After that is done you can delete the old file and move the temporary one in its place.
An old question, recently worked on this. Sharing the experience
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public static void updateFile(Path file) {
// Get all the lines
try (Stream<String> stream = Files.lines(file,StandardCharsets.UTF_8)) {
// Do the replace operation
List<String> list = stream.map(line -> line.replaceAll("test", "new")).collect(Collectors.toList());
// Write the content back
Files.write(file, list, StandardCharsets.UTF_8);
} catch (IOException e) {
e.printStackTrace();
}
}
Related
I'm reading a file line by line, and I am trying to make it so that if I get to a line that fits my specific parameters (in my case if it begins with a certain word), that I can overwrite that line.
My current code:
try {
FileInputStream fis = new FileInputStream(myFile);
DataInputStream in = new DataInputStream(fis);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
if (line.startsWith("word")) {
// replace line code here
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
...where myFile is a File object.
As always, any help, examples, or suggestions are much appreciated.
Thanks!
RandomAccessFile seems a good fit. Its javadoc says:
Instances of this class support both reading and writing to a random access file. A random access file behaves like a large array of bytes stored in the file system. There is a kind of cursor, or index into the implied array, called the file pointer; input operations read bytes starting at the file pointer and advance the file pointer past the bytes read. If the random access file is created in read/write mode, then output operations are also available; output operations write bytes starting at the file pointer and advance the file pointer past the bytes written. Output operations that write past the current end of the implied array cause the array to be extended. The file pointer can be read by the getFilePointer method and set by the seek method.
That said, since text files are a sequential file format, you can not replace a line with a line of a different length without moving all subsequent characters around, so to replace lines will in general amount to reading and writing the entire file. This may be easier to accomplish if you write to a separate file, and rename the output file once you are done. This is also more robust in case if something goes wrong, as one can simply retry with the contents of the initial file. The only advantage of RandomAccessFile is that you do not need the disk space for the temporary output file, and may get slight better performance out of the disk due to better access locality.
Your best bet here is likely going to be reading in the file into memory (Something like a StringBuilder) and writing what you want your output file to look like into the StringBuilder. After you're done reading in the file completely, you'll then want to write the contents of the StringBuilder to the file.
If the file is too large to accomplish this in memory you can always read in the contents of the file line by line and write them to a temporary file instead of a StringBuilder. After that is done you can delete the old file and move the temporary one in its place.
An old question, recently worked on this. Sharing the experience
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public static void updateFile(Path file) {
// Get all the lines
try (Stream<String> stream = Files.lines(file,StandardCharsets.UTF_8)) {
// Do the replace operation
List<String> list = stream.map(line -> line.replaceAll("test", "new")).collect(Collectors.toList());
// Write the content back
Files.write(file, list, StandardCharsets.UTF_8);
} catch (IOException e) {
e.printStackTrace();
}
}
I run a small experiment trying to read a file using BufferedReader, and I wanted to see what would happen if I call the delete method on the file before the read is complete, and given that BufferedReader will only read a chunk of the file at a time I expected the operation to fail, but to my surprise the read was successful.
Here is the code I used:
val file = File("test.txt")
val bufferedReader = BufferedReader(InputStreamReader(FileInputStream(file)), 1)
if (file.delete())
println("file deleted successfully")
println(bufferedReader.readLines().size)
I used a relatively big file for the test with around 300mb of size, and I also set the buffer size to the minimum value possible, and the execution returns this:
file deleted successfully
1303692
Did I misunderstand something here? and could someone please explain this behavior?
The motivation behind this experiment is that I have a method in my application that returns a sequence of all lines in a temporary file, and I wanted to remove the temporary file once all lines were read like this:
fun getTempFileLines(): Sequence<String> {
val file = File("temp.txt")
val bufferedReader = BufferedReader(InputStreamReader(FileInputStream(file)))
val sequenceOfLines = generateSequence {
bufferedReader.readLine()
}
file.delete()
return sequenceOfLines
}
From https://docs.oracle.com/javase/8/docs/api/java/io/BufferedReader.html
"Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines."
So while the actual file may already be deleted, the bufferedReader still contains contents.
I have given number (5-7) of large UTF8 text files (7 MB). In unicode their size is about 15MB each.
I need to load given parts of a given file. The files are known and does not change. I would like to access and load lines at give place as fast as possible. I load these lines adding HTML tags and display them in a JEditorPane. I know the bottle neck will be the rendering by the JEditorPane of the HTML generated but for now I would like to concentrate on the file access performances.
Moreover the user can search for a given word in all the files.
For now the code I use is :
private static void loadFile(String filename, int startLine, int stopLine) {
try {
FileInputStream fis = new FileInputStream(filename);
InputStreamReader isr = new InputStreamReader(fis, "UTF8");
BufferedReader reader = new BufferedReader(isr);
for (int j = startLine; j <= stopLine; j++) {
//here I add HTML tags
//or do string comparison in case of search by the user
sb.append(reader.readLine());
}
reader.close();
} catch (FileNotFoundException e) {
System.out.println(e);
} catch (IOException e) {
System.out.println(e);
}
}
Now my questions :
As the number of parts of each file is known, 67 in my case (for each file), I could create 67 smaller files. It will be "faster" to load a given part but be slower when I do a search as I must open each of the 67 file.
I have not done bench marking but my feelings says that opening 67 files in case of a search is much longer than the time to perform empty reader.readlines when loading a part of the file.
So in my case it is better to have a single larger file. Do you agree with that ?
If I put each large file in the resource, I mean in the Jar file, will the performance be worse, if yes is it significantly worse ?
And the related question is what if I zip each file to spare size. As far as I undersand a Jar file is simply a zip file.
I think I don't know how unzipping works. If I zip a file, will the file be decompressed in memory or will my program be able to access the given lines I need directly on the disk.
Same for the Jar file will it be decompressed in memory.
If unzipping is not in memory can someone edit my code to use zip file.
Final question and the most important for me. I could increase all the performance if everything was performed in memory, but due to unicode and the quite large files this could easily result in a heap of memory of more than 100MB. Is there a possibility of having the zip file loaded in memory and work on that. This would be fast and use only few memory.
Summary of the questions
In my case, 1 large file is best than plenty of small ones.
If files are zipped, is the unzip process (GZipInputStream) performed in memory. Is all the file unzipped in memory and then access or is it possible to access it directly on disk.
If yes to question 2, can someone edit my code to be able to do it ?
MOST IMPORTANT : is it possible to have the zip file loaded in memory and how ?
I hope my questions are clear enough. ;-)
UPDATE : Thanks to Mike for the getResourceAsStream hint, I get it working
Notice that benchmarking give that load the Gzip file is efficient, but in ma case is too slow.
~200 ms for the gzip file
~125 ms for the standard file so 1.6 times faster.
Assuming that the resource folder is called resources
private static void loadFile(String filename, int startLine, int stopLine) {
try {
GZIPInputStream zip = new GZIPInputStream(this.class.getResourceAsStream("resources/"+filename));
InputStreamReader isr = new InputStreamReader(zip, "UTF8");
BufferedReader reader = new BufferedReader(isr);
for (int j = startLine; j <= stopLine; j++) {
//here I add HTML tags
//or do string comparison in case of search by the user
sb.append(reader.readLine());
}
reader.close();
} catch (FileNotFoundException e) {
System.out.println(e);
} catch (IOException e) {
System.out.println(e);
}
}
If the files really aren't changing very often I would suggest using some other data structures. Creating a hash table of all the words and locations they show up would make searching much faster, creating an index of all the line start positions would make that process much faster.
But, to answer your questions more directly:
Yes, one large file is probably still better than many small files, I doubt that reading a line and decoding from UTF8 will be noticeable compared to opening many files, or decompressing many files.
Yes, the unzipping process is performed in memory, and on the fly. It happens as you request data, but acts as a buffered stream, it will decompress entire blocks at a time, so it is actually very efficient.
I can't fix your code directly, but I can suggest looking up getResourceAsStream:
http://docs.oracle.com/javase/6/docs/api/java/lang/Class.html#getResourceAsStream%28java.lang.String%29
This function will open a file that is in a zip / jar file and give you access to it as a stream, automatically decompressing it in memory as you use it.
If you treat it as a resource, java will do it all for you, you will have to read up on some of the specifics of handling resources, but java should handle it fairly intelligently.
I think it would be quicker for you to load the file(s) into memory. You can then zip around to whatever part of the file you need.
Take a look at RandomAccessFile for this.
The GZipInputStream reads the files into memory as a buffered stream.
That's another question entirely :)
Again, the zip file will be decompressed in memory depending on what Class you use to open it.
I have to edit the contents of a file and write the edited conted to another file.Here is the code iam using .
import java.io.*;
import java.util.*;
public class TestRef {
ArrayList<String> lines = new ArrayList<String>();
String line= null;
public void printThis(){
try{
FileReader fr = new FileReader("C:\\Users\\questions.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("C:\\Users\\questions_out.txt");
BufferedWriter out = new BufferedWriter(fw);
while((line=br.readLine()) != null) {
if(line.contains("Javascript"))
line.replace("Javascript"," JAVA");
lines.add(line);
out.write(line);
}
}
catch(Exception e){}
}
public static void main(String [] args){
TestRef tr = new TestRef();
tr.printThis();
}
}
So this is like reading one line at a time and printing it back to the file. But when I execute this code the output file is blank.? Can you please provide me with a sample code, how to read from a file, make change in the content and write the whole file to a new file ?
Well, a few problems:
You're never closing either your input or your output. Closing will also flush - it's possible that something's just not being flushed. You should close stream-based resources in a finally block, so that they end up being closed even in the face of an exception. (Given that you should be closing, I wouldn't bother explicitly flushing as well. Just make sure you close the top-level abstraction - i.e. out (and br).
You're catching Exception and then swallowing it. It could well be that an exception is being thrown, but you're not able to tell because you've swallowed it. You should at least be logging it, and probably stopping at that point. (I'd also suggest catching IOException instead of Exception.)
You're using FileWriter and FileReader which doesn't allow you to specify the input/output encoding - not the issue here, but personally I like to take more control over the encodings I use. I'd suggest using FileInputStream and FileOutputStream wrapped in InputStreamReader and OutputStreamWriter.
You're calling String.replace() and ignoring the result. Strings are immutable - calling replace won't change the existing string. You want:
line = line.replace("Javascript"," JAVA");
You're never using your lines variable, and your line variable would be better as a local variable. It's only relevant within the method itself, so only declare it in the method.
Your code would be easier to follow if it were more appropriately indented. If you're using an IDE, it should be able to do this for you - it makes a huge difference in readability.
The first one is the most likely cause of your current problem, but the rest should help when you're past that. (The point about "replace" will probably be your next issue...)
You are missing out.flush().
BufferedWriters don't write anything until either you flush them, or their buffer fills up.
Close the print writer, outside the loop.
out.flush();
out.close();
Moreover you are writing strings to new lines, if you just want to replace javascript with Java, then you might also wanna write '\n', next line character to new file where old file contains new line.
I have a Java code that reads through an input file using a buffer reader until the readLine() method returns null. I need to use the contents of the file again indefinite number of times. How can I read this file from beginning again?
You can close and reopen it again. Another option: if it is not too large, put its content into, say, a List.
Buffer reader supports reset() to a position of buffered data only. But this cant goto the begin of file (suppose that file larger than buffer).
Solutions:
1.Reopen
2.Use RandomAccessFile
A single Reader should be used once to read the file. If you want to read the file again, create a new Reader based on it.
Using Guava's IO utilities, you can create a nice abstraction that lets you read the file as many times as you want using Files.newReaderSupplier(File, Charset). This gives you an InputSupplier<InputStreamReader> that you can retrieve a new Reader from by calling getInput() at any time.
Even better, Guava has many utility methods that make use of InputSuppliers directly... this saves you from having to worry about closing the supplied Reader yourself. The CharStreams class contains most of the text-related IO utilities. A simple example:
public void doSomeStuff(InputSupplier<? extends Reader> readerSupplier) throws IOException {
boolean needToDoMoreStuff = true;
while (needToDoMoreStuff) {
// this handles creating, reading, and closing the Reader!
List<String> lines = CharStreams.readLines(readerSupplier);
// do some stuff with the lines you read
}
}
Given a File, you could call this method like:
File file = ...;
doSomeStuff(Files.newReaderSupplier(file, Charsets.UTF_8)); // or whatever charset
If you want to do some processing for each line without reading every line into memory first, you could alternatively use the readLines overload that takes a LineProcessor.
you do this by calling the run() function recursively, after checking to see if no more lines can be read - here's a sample
// Reload the file when you reach the end (i.e. when you can't read anymore strings)
if ((sCurrentLine = br.readLine()) == null) {
run();
}
If you want to do this, you may want to consider a random access file. With that you can explicitly set the position back to the beginning and start reading again from there.
i would suggestion usings commons libraries
http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html
i think there is a call to just read the file into a byteArray which might be an alternate approach
Not sure if you have considered the mark() and reset() methods on the BufferedReader
that can be an option if your files are only a few MBs in size and you can set the mark at the beginning of the file and keep reset()ing once you hit the end of the file. It also appears that subsequent reads on the same file will be served entirely from the buffer without having to go to the disk.
I faced with the same issue and came wandering to this question.
1. Using mark() and reset() methods:
BufferedReader can be created using a FileReader and also a FileInputStream. FileReader doesn't support Mark and Reset methods. I got an exception while I tried to do this. Even when I tried with FileInputStream I wasn't able to do it because my file was large (even your's is I guess). If the file length is larger than the buffer then mark and reset methods won't work neither with FileReader not with FileInputStream. More on this in this answer by #jtahlborn.
2. Closing and reopening the file
When I closed and reopened the file and created a new BufferedReader, it worked well.
The ideal way I guess is to reopen the file again and construct a new BufferedReader as a FileReader or FileInputStream should be used only once to read the file.
try {
BufferedReader br = new BufferedReader(new FileReader(input));
while ((line = br.readLine()) != null)
{
//do somethng
}
br.close();
}
catch(IOException e)
{
System.err.println("Error: " + e.getMessage());
}
}