Java: References and GC - java

I'm new in Java programming, and I somehow understand how references and Garbage Collector work, but I need some suggestions.
If (for example), I need to read from files, and I'm using a loop to go through each file and read the text from them, should I avoid doing something like:
(br is an instance of BufferedReader)
br = new BufferedReader(new FileReader("filePath"));
So basically, each time as loop excecutes, br references to a new object of BufferedReader. Is this the wrong way of doing it? And if it is, what can I do to make it work more efficiently?
Thank you in advance for any help you can provide.
Full code:
public int kerko(String folderName, String wantedWord) throws IOException{
File file = new File(folderName);
int count = 0;
if(file.isDirectory()){
File[] files = file.listFiles();
for(File f: files){
if(f.isFile() && f.getName().endsWith(".txt")){
br = new BufferedReader(new FileReader(f.getAbsolutePath()));
String line = br.readLine();
while(line != null){
if(line.toLowerCase().contains(wantedWord)){
count++;
}
line = br.readLine();
}
br.close();
}
count += kerko(f.getAbsolutePath(), wantedWord);
}
}
return count;
}

It's ok to instantiate BufferedReader's and FileReader's in this way.
After leaving { } block these objects will be unreachable and later GC will collect them.

It's absolutely fine to assign multiple objects to the same variable one after another. The garbage collector knows which objects are no longer referenced.
My general advice concerning garbage collection: Unless you do some really advanced stuff, don't think about it. That's what the garbage collector is made for.

Related

How sort N files

Following this answer -->
How do I sort very large files
I need only the Merge function on N already sorted files on disk ,
I want to sort them into one Big file my limitation is the memory Not more than K lines in the memory (K < N) so i cannot fetch all them and then sort, preferred with java
so far I Tried as the code below , but I need a good way to iterate over all N of files line by line (not more than K LINES in memory) + store to disk the sorted final file
public void run() {
try {
System.out.println(file1 + " Started Merging " + file2 );
FileReader fileReader1 = new FileReader(file1);
FileReader fileReader2 = new FileReader(file2);
//......TODO with N ?? ......
FileWriter writer = new FileWriter(file3);
BufferedReader bufferedReader1 = new BufferedReader(fileReader1);
BufferedReader bufferedReader2 = new BufferedReader(fileReader2);
String line1 = bufferedReader1.readLine();
String line2 = bufferedReader2.readLine();
//Merge 2 files based on which string is greater.
while (line1 != null || line2 != null) {
if (line1 == null || (line2 != null && line1.compareTo(line2) > 0)) {
writer.write(line2 + "\r\n");
line2 = bufferedReader2.readLine();
} else {
writer.write(line1 + "\r\n");
line1 = bufferedReader1.readLine();
}
}
System.out.println(file1 + " Done Merging " + file2 );
new File(file1).delete();
new File(file2).delete();
writer.close();
} catch (Exception e) {
System.out.println(e);
}
}
regards,
You can use something like this
public static void mergeFiles(String target, String... input) throws IOException {
String lineBreak = System.getProperty("line.separator");
PriorityQueue<Map.Entry<String,BufferedReader>> lines
= new PriorityQueue<>(Map.Entry.comparingByKey());
try(FileWriter fw = new FileWriter(target)) {
String header = null;
for(String file: input) {
BufferedReader br = new BufferedReader(new FileReader(file));
String line = br.readLine();
if(line == null) br.close();
else {
if(header == null) fw.append(header = line).write(lineBreak);
line = br.readLine();
if(line != null) lines.add(new AbstractMap.SimpleImmutableEntry<>(line, br));
else br.close();
}
}
for(;;) {
Map.Entry<String, BufferedReader> next = lines.poll();
if(next == null) break;
fw.append(next.getKey()).write(lineBreak);
final BufferedReader br = next.getValue();
String line = br.readLine();
if(line != null) lines.add(new AbstractMap.SimpleImmutableEntry<>(line, br));
else br.close();
}
}
catch(Throwable t) {
for(Map.Entry<String,BufferedReader> br: lines) try {
br.getValue().close();
} catch(Throwable next) {
if(t != next) t.addSuppressed(next);
}
}
}
Note that this code, unlike the code in your question, handles the header line. Like the original code, it will delete the input lines. If that’s not intended, you can remove the DELETE_ON_CLOSE option and simplify the entire reader construction to
BufferedReader br = new BufferedReader(new FileReader(file));
It has exactly as much lines in memory, as you have files.
While in principle, it is possible to hold less line strings in memory, to re-read them when needed, it would be a performance disaster for a questionable little saving. E.g. you have already N strings in memory when calling this method, due to the fact that you have N file names.
However, when you want to reduce the number of lines held at the same time, at all costs, you can simply use the method shown in your question. Merge the first two files into a temporary file, merge that temporary file with the third to another temporary file, and so on, until merging the temporary file with the last input file to the final result. Then you have at most two line strings in memory (K == 2), saving less memory than the operating system will use for buffering, trying to mitigate the horrible performance of this approach.
Likewise, you can use the method shown above to merge K files into a temporary file, then merge the temporary file with the next K-1 file, and so on, until merging the temporary file with the remaining K-1 or less files to the final result, to have a memory consumption scaling with K < N. This approach allows to tune K to have a reasonable ratio to N, to trade memory for speed. I think, in most practical cases, K == N will work just fine.
#Holger gave a nice answer assuming that K>=N.
You can extend it to the K<N case by using mark(int) and reset() methods of the BufferedInputStream.
The parameter of mark is how many bytes a single line can have.
The idea is as follows:
Instead of putting all the N lines in the TreeMap, you can only have K of them. Whenever you put a new line into the set and it is already 'full' you evict the smallest one from it. Additionally, you reset the stream from which it came. So when you will read it again the same data can pop up.
You have to keep track of the maximum line not kept in the TreeSet, lets call it the lower bound. Once there are no elements in the TreeSet greater than the maintained lower bound, you scan all the files once again and repopulate the set.
I'm not sure if this approach is optimal, but should be ok.
Moreover, you have to be aware that BufferedInputStream has an internal buffer at least the size of a single line, so that will consume a lot of your memory, perhaps it would be better to maintain buffering on your own.

Garbage Collection for Strings

I have a text file which I need to read line by line and do some processing on each line.
ConcurrentMap<String, String> hm = new ConcurrentHashMap<>();
InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream("filename.txt");
InputStreamReader stream = new InputStreamReader(is, StandardCharsets.UTF_8);
BufferedReader reader = new BufferedReader(stream);
while(true)
{
line = reader.readLine();
if (line == null) {
break;
}
String text = line.substring(0, line.lastIndexOf(",")).trim();
String id = line.substring(line.lastIndexOf(",") + 1).trim();
hm.put(text,id);
}
I need to know, when will the strings created during the substring() and trim() operations be garbage collected?
Also, what about the String line?
The strings themselves will be garbage collected as soon as they go out of scope, which happens at the end of each iteration of the while loop. But from a memory usage point of view this is a moot point because you are storing this data into a map which will not go out of scope.
If you include information about how you are using this map, maybe a solution can be given which avoids having to store everything in memory.

Java 6: Copy and manipulate files

What I want to do...
I have XML-Files with names like SomeName999999blablabla.xml with lots of content, where almost every line contains the string "999999". I need identical xml-files where 999999 is replaced by 888888, 777777, and so on, in the name and the file's content.
The problem...
My code works fine and actually creates all the files I need, BUT there are sometimes tiny errors. Like in one line an E is "randomly" replaced by a D (it seems to be always one letter lower than what its supposed to be, but I can't confirm that 100%). Its not a lot, like one or two instances in 60 files, each file being about 100MB. But since its an xml this is a real problem, as this often is a schema violation, which causes a crash in later processing.
I have absolutely no idea where this is coming from or how to fix it, please help.
My code so far...
private void createMandant(String mandant) throws Exception {
String line;
File dir = new File(TestConstants.getXmlDirectory());
for (File file : dir.listFiles()) {
if (file.getName().endsWith((".xml")) && file.getName().contains("999999")) {
BufferedReader br = new BufferedReader(new FileReader(file));
FileWriter fw = new FileWriter(file.getAbsolutePath().replace("999999", mandant));
while ((line = br.readLine()) != null) {
fw.write(line.replace("999999", mandant) + "\r\n");
}
br.close();
fw.close();
}
}
}
Environment...
We are on Java 6. As mentioned before the files are quite large. Like 100MB, several hundred thousand lines each.
It appears to be a problem with String.replace()
I have replaced it with StringBuilder:
while ((line = br.readLine()) != null) {
index = 0;
// fw.write(line.replace("999999", mandant) + "\r\n");
StringBuilder builder = new StringBuilder(line);
index = builder.indexOf("999999");
if (index > 0) {
fw.write(builder.replace(index, index + 6, mandant).toString() + "\r\n");
} else {
fw.write(line + "\r\n");
}
}
... and now it seems to work. Two runs have already completed without any problems.
But that seems very strange. Could it really be that a heavily used function like String.replace() just randomly gets single letters wrong every few million method calls?

Returning the number of lines in a .txt file

This is my debut question here, so I will try to be as clear as I can.
I have a sentences.txt file like this:
Galatasaray beat Juventus 1-0 last night.
I'm going to go wherever you never can find me.
Papaya is such a delicious thing to eat!
Damn lecturer never gives more than 70.
What's in your mind?
As obvious there are 5 sentences, and my objective is to write a listSize method that returns the number of sentences listed here.
public int listSize()
{
// the code is supposed to be here.
return sentence_total;}
All help is appreciated.
To read a file and count its lines, use a java.io.LineNumberReader, plugged on top of a FileReader. Call readLine() on it until it returns null, then getLineNumber() to know the last line number, and you're done !
Alternatively (Java 7+), you can use the NIO2 Files class to fully read the file at once into a List<String>, then return the size of that list.
BTW, I don't understand why your method takes that int as a parameter, it it's supposed to be the value to compute and return ?
Using LineNumberReader:
LineNumberReader reader = new LineNumberReader(new FileReader(new File("sentences.txt")));
reader.skip(Long.MAX_VALUE);
System.out.println(reader.getLineNumber() + 1); // +1 because line index starts at 0
reader.close();
use the following code to get number of lines in that file..
try {
File file = new File("filePath");
BufferedReader reader = new BufferedReader(new FileReader(file));
String line;
int totalLines = 0;
while((line = reader.readLine()) != null) {
totalLines++;
}
reader.close();
System.out.println(totalLines);
} catch (Exception ex) {
ex.printStackTrace(System.err);
}
You could do:
Path file = Paths.getPath("route/to/myFile.txt");
int numLines = Files.readAllLlines(file).size();
If you want to limit them or process them lazily:
Path file = Paths.getPath("route/to/myFile.txt");
int numLines = Files.llines(file).limit(maxLines).collect(Collectors.counting...);

Java create strings from Buffered Reader and compare Strings

I am using Java + Selenium 1 to test a web application.
I have to read through a text file line by line using befferedreader.readLine and compare the data that was found to another String.
Is there way to assign each line a unique string? I think it would be something like this:
FileInputStream fstream = new FileInputStream("C:\\write.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
String[] strArray = null;
int p=0;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
strArray[p] = strLine;
assertTrue(strArray[p].equals(someString));
p=p+1;
}
The problem with this is that you don't know how many lines there are, so you can't size your array correctly. Use a List<String> instead.
In order of decreasing importance,
You don't need to store the Strings in an array at all, as pointed out by Perception.
You don't know how many lines there are, so as pointed out by Qwerky, if you do need to store them you should use a resizeable collection like ArrayList.
DataInputStream is not needed: you can just wrap your FileInputStream directly in an InputStreamReader.
You may want to try something like:
public final static String someString = "someString";
public boolean isMyFileOk(String filename){
Scanner sc = new Scanner(filename);
boolean fileOk = true;
while(sc.hasNext() && fileOk){
String line = sc.nextLine();
fileOk = isMyLineOk(line);
}
sc.close();
return fileOk;
}
public boolean isMyLineOk(String line){
return line.equals(someString);
}
The Scanner class is usually a great class to read files :)
And as suggested, you may check one line at a time instead of loading them all in memory before processing them. This may not be an issue if your file is relatively small but you better keep your code scalable, especially for doing the exact same thing :)

Categories