Garbage Collection for Strings - java

I have a text file which I need to read line by line and do some processing on each line.
ConcurrentMap<String, String> hm = new ConcurrentHashMap<>();
InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream("filename.txt");
InputStreamReader stream = new InputStreamReader(is, StandardCharsets.UTF_8);
BufferedReader reader = new BufferedReader(stream);
while(true)
{
line = reader.readLine();
if (line == null) {
break;
}
String text = line.substring(0, line.lastIndexOf(",")).trim();
String id = line.substring(line.lastIndexOf(",") + 1).trim();
hm.put(text,id);
}
I need to know, when will the strings created during the substring() and trim() operations be garbage collected?
Also, what about the String line?

The strings themselves will be garbage collected as soon as they go out of scope, which happens at the end of each iteration of the while loop. But from a memory usage point of view this is a moot point because you are storing this data into a map which will not go out of scope.
If you include information about how you are using this map, maybe a solution can be given which avoids having to store everything in memory.

Related

How to split a byte array that contains multiple "lines" in Java?

Say we have a file like so:
one
two
three
(but this file got encrypted)
My crypto method returns the whole file in memory, as a byte[] type.
I know byte arrays don't have a concept of "lines", that's something a Scanner (for example) could have.
I would like to traverse each line, convert it to string and perform my operation on it but I don't know
how to:
Find lines in a byte array
Slice the original byte array to "lines" (I would convert those slices to String, to send to my other methods)
Correctly traverse a byte array, where each iteration is a new "line"
Also: do I need to consider the different OS the file might have been composed in? I know that there is some difference between new lines in Windows and Linux and I don't want my method to work only with one format.
Edit: Following some tips from answers here, I was able to write some code that gets the job done. I still wonder if this code is worthy of keeping or I am doing something that can fail in the future:
byte[] decryptedBytes = doMyCrypto(fileName, accessKey);
ByteArrayInputStream byteArrInStrm = new ByteArrayInputStream(decryptedBytes);
InputStreamReader inStrmReader = new InputStreamReader(byteArrInStrm);
BufferedReader buffReader = new BufferedReader(inStrmReader);
String delimRegex = ",";
String line;
String[] values = null;
while ((line = buffReader.readLine()) != null) {
values = line.split(delimRegex);
if (Objects.equals(values[0], tableKey)) {
return values;
}
}
System.out.println(String.format("No entry with key %s in %s", tableKey, fileName));
return values;
In particular, I was advised to explicitly set the encoding but I was unable to see exactly where?
If you want to stream this, I'd suggest:
Create a ByteArrayInputStream to wrap your array
Wrap that in an InputStreamReader to convert binary data to text - I suggest you explicitly specify the text encoding being used
Create a BufferedReader around that to read a line at a time
Then you can just use:
String line;
while ((line = bufferedReader.readLine()) != null)
{
// Do something with the line
}
BufferedReader handles line breaks from all operating systems.
So something like this:
byte[] data = ...;
ByteArrayInputStream stream = new ByteArrayInputStream(data);
InputStreamReader streamReader = new InputStreamReader(stream, StandardCharsets.UTF_8);
BufferedReader bufferedReader = new BufferedReader(streamReader);
String line;
while ((line = bufferedReader.readLine()) != null)
{
System.out.println(line);
}
Note that in general you'd want to use try-with-resources blocks for the streams and readers - but it doesn't matter in this case, because it's just in memory.
As Scott states i would like to see what you came up with so we can help you alter it to fit your needs.
Regarding your last comment about the OS; if you want to support multiple file types you should consider making several functions that support those different file extensions. As far as i know you do need to specify which file and what type of file you are reading with your code.

Java: References and GC

I'm new in Java programming, and I somehow understand how references and Garbage Collector work, but I need some suggestions.
If (for example), I need to read from files, and I'm using a loop to go through each file and read the text from them, should I avoid doing something like:
(br is an instance of BufferedReader)
br = new BufferedReader(new FileReader("filePath"));
So basically, each time as loop excecutes, br references to a new object of BufferedReader. Is this the wrong way of doing it? And if it is, what can I do to make it work more efficiently?
Thank you in advance for any help you can provide.
Full code:
public int kerko(String folderName, String wantedWord) throws IOException{
File file = new File(folderName);
int count = 0;
if(file.isDirectory()){
File[] files = file.listFiles();
for(File f: files){
if(f.isFile() && f.getName().endsWith(".txt")){
br = new BufferedReader(new FileReader(f.getAbsolutePath()));
String line = br.readLine();
while(line != null){
if(line.toLowerCase().contains(wantedWord)){
count++;
}
line = br.readLine();
}
br.close();
}
count += kerko(f.getAbsolutePath(), wantedWord);
}
}
return count;
}
It's ok to instantiate BufferedReader's and FileReader's in this way.
After leaving { } block these objects will be unreachable and later GC will collect them.
It's absolutely fine to assign multiple objects to the same variable one after another. The garbage collector knows which objects are no longer referenced.
My general advice concerning garbage collection: Unless you do some really advanced stuff, don't think about it. That's what the garbage collector is made for.

How to split a very long string

I have big file (about 30mb) and here the code I use to read data from the file
BufferedReader br = new BufferedReader(new FileReader(file));
try {
String line = br.readLine();
while (line != null) {
sb.append(line).append("\n");
line = br.readLine();
}
Then I need to split the content I read, so I use
String[] inst = sb.toString().split("GO");
The problem is that sometimes the sub-string is over the maximum String length and I can't get all the data inside the string. How can I get rid of this?
Thanks
Scanner s = new Scanner(input).useDelimiter("GO"); and use s.next()
WHY PART:- The erroneous result may be the outcome of non contiguous heap segment as the CMS collector doesn't de-fragment memory.
(It does not answer your how to solve part though).
You may opt for loading the whole string partwise, i.e using substring

Readline is too slow - Anything Faster?

I am reading in from a stream using a BufferedReader and InputStreamReader to create one long string that gets created from the readers. It gets up to over 100,000 lines and then throws a 500 error (call failed on the server). I am not sure what is the problem, is there anything faster than this method? It works when the lines are in the thousands but i am working with large data sets.
BufferedReader in = new BufferedReader(new InputStreamReader(newConnect.getInputStream()));
String inputLine;
String xmlObject = "";
StringBuffer str = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
str.append(inputLine);
str.toString();
}
in.close();
Thanks in advance
to create one long string that gets created from the readers.
Are you by any chance doing this to create your "long string"?
String string;
while(...)
string+=whateverComesFromTheSocket;
If yes, then change it to
StringBuilder str = new StringBuilder(); //Edit:Just changed StringBuffer to StringBuilder
while(...)
str.append(whateverComesFromTheSocket);
String string = str.toString();
String objects are immutable and when you do str+="something", memory is reallocated and str+"something" is copied to that newly allocated area. This is a costly operation and running it 51,000 times is an extremely bad thing to do.
StringBuffer and StringBuilder are String's mutable brothers and StringBuilder, being non-concurrent is more efficient than StringBuffer.
readline() can read at about 90 MB/s, its what you are doing with the data read which is slow. BTW readline removes newlines so this approach you are using is flawed as it will turn everying into one line.
Rather than re-inventing the wheel I would suggest you try FileUtils.readLineToString()
This will read a file as a STring without discarding newlines, efficiently.

Read last n line from url stream problem

I have problem to read last n lines from url. How to do that ? I have url.openstream but there is no contrsuctor for RandomAccessFile which has input for stream. Can somebody help me ? Is there meybe already library for this. ( I know how to implement with RandomAccess when I have file but how to change stream to file ).
Open the URL stream as per usual.
Wrap the returned InputStream in a BufferedReader so you can read it line by line.
Maintain a LinkedList into which you will save the lines.
After reading each line from the BufferedReader:
Add the line to the list.
If the size of the list is greater than "n" then call LinkedList#removeFirst().
Once you have read all lines from the stream the list will contain the last "n" lines.
For example (untested, just for demonstration):
BufferedReader in = new BufferedReader(url.openStream());
LinkedList<String> lines = new LinkedList<String>();
String line = null;
while ((line = in.readLine()) != null) {
lines.add(line);
if (lines.size() > nLines) {
lines.removeFirst();
}
}
// Now "lines" has the last "n" lines of the stream.
Sorry. You're going to have to do this one yourself. But don't worry because it's pretty simple.
You just need to keep track of the last n lines you have encountered since you started reading from the UrlStream. Might I suggest using a Queue?
Basically you could do something like
public String[] readLastNLines(final URL url, final int n) throws IOException{
final Queue<String> q = new LinkedList<String>();
final BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
String line=null;
while ((line = br.readLine())!=null)
{
q.add(line);
if (q.size()>n) q.remove();
}
return q.toArray(new String[q.size()]);
}
readLastNLines returns an array containing the last n lines read from url.
Unfortunately, you cannot use a RandomAccessFile with a stream from the Internet because streams are, by definition, not random access.

Categories