So I understand that strings in Java are immutable. I'm interested in how to repeatedly update a certain string.
Ex:
public static void main(String[] args) {
String myString = "hey";
for (int i = 1; i <= 9; i++) {
myString += "hey";
}
}
Now, this won't work in Java because I've already declared and assigned myString. How do people get around Java's immutable strings (as in the above example)?
The only thing I can think to do is declare another string. Unfortunately, this just delays my problem, as the second time through the loop, I'll be reassigning an already assigned string:
public static void main(String[] args) {
String myString = "hey";
String secondString;
for (int i = 1; i <= 10; i++) {
secondString += "hey";
}
}
Any suggestions / explanations are much appreciated!
Thanks,
Mariogs
A Quick Answer
You should use a StringBuilder for this sort of thing. It is designed to put Strings together without constantly copying or holding on to older Strings.
public class SimpleGrowingString {
private StringBuilder stringBuilder;
public SimpleGrowingString() {
this.stringBuilder = new Stringbuilder();
}
public void addToString(String str) {
this.stringBuilder.append(str);
}
public String getString() {
return this.stringBuilder.toString();
}
}
A Not So Quick Answer:
Immutable?
While Strings are immutable, you can re-assign a String variable.
The variable will then reference (point to) the new String assigned to it and the old value will be marked for Garbage Collection and only hang about in RAM until the Garbage Collector gets off its arse and around to clearing it out. That is, as long as there are no other references to it (or subsections of it) still about somewhere.
Immutable means that you cannot change a String itself not that you cannot reassign what the variable that was pointing to its value now is.
eg.
String str = "string one";
The String "string one" exists in memory and can not be changed, modified, cut up, added to etc.
It is immutable.
If I then say:
str = "a different string";
Then the variable str now references a different piece of data in memory; the String "a different string".
The original String "string one" is still the exact same String that it was before we've just told the handle we had for it to point to something else. The old String is still floating around in memory but now it's headless and we no longer have any way to actually access that value.
This leads to the idea of Garbage Collection.
Garbage. Garbage Everywhere.
The Garbage Collector runs every now and again and cleans out old, unnecessary data that's no longer being used.
It decides what is and isn't useful, among other ways, by checking if there are any valid handles/variables currently pointing at the data. If there's nothing using it and there's no way for us to even access it anymore it's useless to us and it gets dumped.
But you can't really ever rely on the Garbage Collector to clean out thing on time, quickly or even get it to run when you want it to. It does its own thing, in its own time. You are better off trying to minimise its workload than assuming it's going to clean up after you all the time.
And now that you have, an admittedly very basic, grounding in Garbage Collection we can talk about why you don't add Strings together:
String con+cat+en+a+tion
One big issue with using + for Strings (and the reason that StringBuilder and StringBuffer were designed) is that it creates an awful lot of Strings. Strings all over the place! They may end up as candidates for Garbage Collection relatively quickly depending on your usage, but they still can lead to bloat if handled incorrectly, especially when loops get involved, as the Garbage Collector runs whenever it damn well feels like and we can't really control that we can't say that things are not getting out of hand unless we ourselves stop them getting that way.
Doing the simple String concatenation of:
"a String" + " another String"
actually leads to having three Strings in memory:
"a String", " another String" and "a String another String"
Expand this out to a simple, fairly short loop and you can see how things can get out of hand pretty quickly:
String str = "";
for (int i=0; i<=6; i++) {
str += "a chunk of RAM ";
}
Which at each loops means we have in memory:
0:
"a chunk of RAM "
1:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM"
2:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM"
"a chunk of RAM a chunk of RAM a chunk of RAM"
3:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
4:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM"
5:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM"
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
6:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM"
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
And so on and so on... You can see where that's going and how quickly it's getting there.
Moral of the story
If you are looping with and combining Strings use a StringBuilder or StringBuffer it's what they were made for.
"Sidenote".substring(4);
Concatenating Strings isn't the only way to end up with a lot of wasted RAM because of String's immutability.
Just as we can't add to the end of a String neither can we chop off the tail.
String moderatelyLongString = "a somewhat lengthy String that rambles on and on about nothing in particular for quite a while even though nobody's listening";
If perhaps we wanted to make use of only a part of this String we could use the String.substring() method:
String snippedString = moderatelyLongString.substring(0, 13);
System.out.println(snippedString):
>> a somewhat le
Ok, so what's wrong with that?
Well, if we wanted to dump the first, long String but hang onto the short bit you might think that we can just say:
moderatelyLongString = null;
You may think that will leave the long String abandoned and alone, crying in a corner waiting for the GC to come and kick it out into the cold but you'd be wrong.
Since we've still got a hold of a couple of characters of the longer chain in snippedString the entire moderatelyLongString stays on the heap wasting space.
If you wanted to do this but shed the useless weight what you would want to do is copy the shortened part but not retain a tie to the lengthy bit:
String aNicerMorePoliteShortString = new String(moderatelyLongString.substring(0, 13));
This makes a copy of the short String taken from the long that is its own stand alone array of characters and has nothing to do with that pestering hanger-on that is the long String.
Now doing this will, this time, mark the long String as available for Collection as we have no remaining ties to it:
moderatelyLongString = null;
However
If you just wanted to display a single, different String in a loop on every iteration what you want to do is just (re)use a single variable so that all of the older Strings in memory get released as soon as possible and become available for Garbage Collection as quickly as they can be. Declare your variable outside of the loop and then reassign it inside on every iteration:
String whatYouWantToUse;
for (int i=0; i<100; i++) {
whatYouWantToUse = someStringyGettyMethod();
howYouWantToUseIt(whatYouWantToUse);
}
Each time this loop loops it is assigning a new value to the variable which throws the older value onto the pile of waste for the Garbage Collector to clean up in time, or, you know, whenever it could be bothered to...
Arguably, a better way to do the above method is to never try to hold onto the String at all — just pass it straight though from where we get it from to where it's wanted:
for (int i=0; i<100; i++) {
howYouWantToUseIt(someStringyGettyMethod());
}
But watch out for over optimising this sort of thing as readability is almost always more important than compactness.
Most compilers are smarter than we'll ever be, or than I will be at least. They can find all the great shortcuts and minifications that can be done to your code and apply their wizardry in a more magnificent way than we mortals could hope to achieve.
If you try to streamline your code too much then all you're left with is two varieties of unreadable code instead of one useful, fast and optimised version and the other maintainable and something Johnny with the off-putting habit of sniffling every 25 seconds two desks over can follow.
this won't work in Java because I've already declared and assigned myString
You are wrong, it will still work but each time you append to the string it will generate a new string.
If you dont want to generate new string when you append/add to it then StringBuilder is the solution.
sample:
public static void main(String args[]) {
StringBuilder sb = new StringBuilder("hey");
for (int i = 1; i <= 9; i++) {
sb.append("hey");
}
}
Being immutable doesnt mean that it wont work, it just means that the object you created wont be modified.. but the String variable can be assigned to another object (the new string created by concatenating the previous strings on += "hey").
If you want to do it like a mutable object, then just use StringBuilder append() method.
While Strings in Java are immutable, your first example above will work because it creates a new String object every time through the loop, and assigns the newly created string to myString:
public static void main(String[] args) {
String myString = "hey";
for (int i = 1; i <= 9; i++) {
myString += "hey";
}
System.out.println(myString); // prints "heyheyheyheyheyheyheyheyheyhey"
}
While this works, it's inefficient due to the object creation. For a loop with more iterations, or when concatenating longer strings, performance might be a concern – so there are better ways to do it.
One better approach to concatenating Strings in Java is to use StringBuilder. Here's your code adapted to use StringBuilder:
public static void main(String[] args) {
StringBuilder builder = new StringBuilder(50); // estimated length
for (int i = 1; i <= 9; i++) {
builder.append("hey");
}
String myString = builder.toString(); // convert to String when needed
System.out.println(myString); // prints "heyheyheyhey..."
}
StringBuilder has a backing array as a buffer, which is expanded whenever the appended length exceeds the size of the buffer. In this case, we start with an initial allocation of 50 characters.
In a real world situation, you could set the initial size of the StringBuilder buffer based on the size of the input, to minimise the need for expensive buffer expansions.
When you execute the working code in your question, you will simply create a new string in memory each time you append to it.
This means that every time you append something to your string it will be a new string object in memory, implying that it also has a new memory address.
This is because strings are immutable indeed.
If you only want to create a string object once, you should use a StringBuilder, otherwise this solution works fine.
StringBuilders are recommended for building string that you will - as you do - modify a lot. Because modifying a string a lot (i.e., creating many new strings) does a lot of reading and writing in your memory.
Your code works perfectly fine. Although its not recommended to work on strings like you do.
Have a look at Java's StringBuilder: http://docs.oracle.com/javase/7/docs/api/java/lang/StringBuffer.html
With the aid of a StringBuilder, you can modify the string.
Related
Problem
I wrote 2 programs, one in Delphi and one in Java, for string concatenation and I noticed a much faster string concatenation in Delphi compared to Java.
Java
String str = new String();
long t0 = System.nanoTime();
for (int i = 0; i < 50000; i++)
str += "abc";
long t1 = System.nanoTime();
System.out.println("String + String needed " + (t1 - t0) / 1000000 + "ms");
Delphi
Stopwatch.Start;
for i := 1 to 50000 do
str := str + 'abc';
Stopwatch.Stop;
ShowMessage('Time in ms: ' + IntToStr(Stopwatch.ElapsedMilliseconds));
Question
Both measure the time in milliseconds but the Delphi program is much faster with 1ms vs. Javas 2 seconds. Why is string concatenation so much faster in Delphi?
Edit: Looking back at this question with more experience I should have come to the conclusion that the main difference comes from Delphi being compiled and Java being compiled and then run in the JVM.
TLDR
There may be other factors, but certainly a big contributor is likely to be Delphi's default memory manager. It's designed to be a little wasteful of space in order to reduce how often memory is reallocated.
Considering memory manager overhead
When you have a straight-forward memory manager (you might even call it 'naive'), your loop concatenating strings would actually be more like:
//pseudo-code
for I := 1 to 50000 do
begin
if CanReallocInPlace(Str) then
//Great when True; but this might not always be possible.
ReallocMem(Str, Length(Str) + Length(Txt))
else
begin
AllocMem(NewStr, Length(Str) + Length(Txt))
Copy(Str, NewStr, Length(Str))
FreeMem(Str)
end;
Copy(Txt, NewStr[Length(NewStr)-Length(Txt)], Length(Txt))
end;
Notice that on every iteration you increase the allocation. And if you're unlucky, you very often have to:
Allocate memory in a new location
Copy the existing 'string so far'
Finally release the old string
Delphi (and FastMM)
However, Delphi has switched from the default memory manager used in it's early days to a previously 3rd party one (FastMM) that's designed run faster primarily by:
(1) Using a sub-allocator i.e. getting memory from the OS a 'large' page at a time.
Then performing allocations from the page until it runs out.
And only then getting another page from the OS.
(2) Aggressively allocating more memory than requested (anticipating small growth).
Then it becomes more likely the a slightly larger request can be reallocated in-place.
These techniques can thought it's not guaranteed increase performance.
But it definitely does waste space. (And with unlucky fragmentation, the wastage can be quite severe.)
Conclusion
Certainly the simple app you wrote to demonstrate the performance greatly benefits from the new memory manager. You run through a loop that incrementally reallocates the string on every iteration. Hopefully with as many in-place allocations as possible.
You could attempt to circumvent some of FastMM's performance improvements by forcing additional allocations in the loop. (Though sub-allocation of pages would still be in effect.)
So simplest would be to try an older Delphi compiler (such as D5) to demonstrate the point.
FWIW: String Builders
You said you "don't want to use the String Builder". However, I'd like to point out that a string builder obtains similar benefits. Specifically (if implemented as intended): a string builder wouldn't need to reallocate the substrings all the time. When it comes time to finally build the string; the correct amount of memory can be allocated in a single step, and all portions of the 'built string' copied to where they belong.
In Java (and C#) strings are immutable objects. That means that if you have:
string s = "String 1";
then the compiler allocates memory for this string. Haven then
s = s + " String 2"
gives us "String 1 String 2" as expected but because of the immutability of the strings, a new string was allocated, with the exactly size to contain "String 1 String 2" and the content of both strings is copied to the new location. Then the original strings are deleted by the garbage collector. In Delphi a string is more "copy-on-write" and reference counted, which is much faster.
C# and Java have the class StringBuilder with behaves a lot like Delphi strings and are quite faster when modifying and manipulating strings.
Here is the function:
String gen() {
String raw_content = "";
String line;
for (int i=0; i<10000; i++) {
line = randString();
raw_content += line + "\n";
}
return raw_content;
}
When I call gen() for 100 times in main(), my program will stuck.
I suspect this is related to memory leak caused by Java String. So will the no-longer used memory be freed by JVM automatically? How to fix this?
Thanks!
To make a long story short, in java (and other JVM languages), you don't have to care about memory allocation at all. You really shouldn't be worrying about it - at some time after all references to it have been lost, it'll be freed up by the garbage collecting thread. See: Garbage Collection in Java.
Your problem has less to do with memory and more that your function is just really time intensive (as Hot Licks said in comment). Strings in Java are immutable, so when you say raw_content += line + "\n"; you're really creating a new string of raw_content + line + "\n" and setting raw_content equal to that. If rand_string() returns long results, this will become an egregiously long string. If you really want to perform this function, StringBuilders are the way to go to at least reduce it from O(N^2) to O(N). If you're just looking for a memory exercise, you don't have to actually do any changes - just read the above article.
For a school assignment, I need to create a Simulation for memory accesses. First I need to read 1 or more trace files. Each contains memory addresses for each access. Example:
0 F001CBAD
2 EEECA89F
0 EBC17910
...
Where the first integer indicates a read/write etc. then the hex memory address follows. With this data, I am supposed to run a simulation. So the idea I had was parse these data into an ArrayList<Trace> (for now I am using Java) with trace being a simple class containing the memory address and the access type (just a String and an integer). After which I plan to loop through these array lists to process them.
The problem is even at parsing, it running out of heap space. Each trace file is ~200MB. I have up to 8. Meaning minimum of ~1.6 GB of data I am trying to "cache"? What baffles me is I am only parsing 1 file and java is using 2GB according to my task manager ...
What is a better way of doing this?
A code snippet can be found at Code Review
The answer I gave on codereview is the same one you should use here .....
But, because duplication appears to be OK, I'll duplicate the answer here.
The issue is almost certainly in the structure of your Trace class, and it's memory efficiency. You should ensure that the instrType and hexAddress are stored as memory efficient structures. The instrType appears to be an int, which is good, but just make sure that it is declared as an int in the Trace class.
The more likely problem is the size of the hexAddress String. You may not realise it but Strings are notorious for 'leaking' memory. In this case, you have a line and you think you are just getting the hexString from it... but in reality, the hexString contains the entire line.... yeah, really. For example, look at the following code:
public class SToken {
public static void main(String[] args) {
StringTokenizer tokenizer = new StringTokenizer("99 bottles of beer");
int instrType = Integer.parseInt(tokenizer.nextToken());
String hexAddr = tokenizer.nextToken();
System.out.println(instrType + hexAddr);
}
}
Now, set a break-point in (I use eclipse) your IDE, and then run it, and you will see that hexAddr contains a char[] array for the entire line, and it has an offset of 3 and a count of 7.
Because of the way that String substring and other constructs work, they can consume huge amounts of memory for short strings... (in theory that memory is shared with other strings though). As a consequence, you are essentially storing the entire file in memory!!!!
At a minimum, you should change your code to:
hexAddr = new String(tokenizer.nextToken().toCharArray());
But even better would be:
long hexAddr = parseHexAddress(tokenizer.nextToken());
Like rolfl I answered your question in the code review. The biggest issue, to me, is the reading everything into memory first and then processing. You need to read a fixed amount, process that, and repeat until finished.
Try use class java.nio.ByteBuffer instead of java.util.ArrayList<Trace>. It should also reduce the memory usage.
class TraceList {
private ByteBuffer buffer;
public TraceList(){
//allocate byte buffer
}
public void put(byte operationType, int addres) {
//put data to byte buffer
}
public Trace get(int index) {
//get data from byte buffer by index
byte type = ...//read type
int addres = ...//read addres
return new Trace(type, addres)
}
}
I found out the memory my program is increasing is because of the code below, currently I am reading a file that is about 7GB big, and I believe the one that would be stored in the hashset is lesson than 10M, but the memory my program keeps increasing to 300MB and then crashes because of OutofMemoryError. If it is the Hashset problem, which data structure shall I choose?
if(tagsStr!=null) {
if(tagsStr.contains("a")||tagsStr.contains("b")||tagsStr.contains("c")) {
maTable.add(postId);
}
} else {
if(maTable.contains(parentId)) {
//do sth else, no memories added here
}
}
You haven't really told us what you're doing, but:
If your file is currently in something like ASCII, each character you read will be one byte in the file or two bytes in memory.
Each string will have an object overhead - this can be significant if you're storing lots of small strings
If you're reading lines with BufferedReader (or taking substrings from large strings), each one may have a large backing buffer - you may want to use maTable.add(new String(postId)) to avoid this
Each entry in the hash set needs a separate object to keep the key/hashcode/value/next-entry values. Again, with a lot of entries this can add up
In short, it's quite possible that you're doing nothing wrong, but a combination of memory-increasing factors are working against you. Most of these are unavoidable, but the third one may be relevant.
You've either got a memory leak or your understanding of the amount of string data that you are storing is incorrect. We can't tell which without seeing more of your code.
The scientific solution is to run your application using a memory profiler, and analyze the output to see which of your data structures is using an unexpectedly large amount of memory.
If I was to guess, it would be that your application (at some level) is doing something like this:
String line;
while ((line = br.readLine()) != null) {
// search for tag in line
String tagStr = line.substring(pos1, pos2);
// code as per your example
}
This uses a lot more memory than you'd expect. The substring(...) call creates a tagStr object that refers to the backing array of the original line string. Your tag strings that you expect to be short actually refer to a char[] object that holds all characters in the original line.
The fix is to do this:
String tagStr = new String(line.substring(pos1, pos2));
This creates a String object that does not share the backing array of the argument String.
UPDATE - this or something similar is an increasingly likely explanation ... given your latest data.
To expand on another of Jon Skeet's point, the overheads of a small String are surprisingly high. For instance, on a typical 32 bit JVM, the memory usage of a one character String is:
String object header for String object: 2 words
String object fields: 3 words
Padding: 1 word (I think)
Backing array object header: 3 words
Backing array data: 1 word
Total: 10 words - 40 bytes - to hold one char of data ... or one byte of data if your input is in an 8-bit character set.
(This is not sufficient to explain your problem, but you should be aware of it anyway.)
Couldn't be it possible that the data read into memory (from the 7G file) is somehow not freed? Something ike Jon puts... ie. since strings are immutable every string read requires a new String object creation which might lead to out of memory if GC is not quick enough...
If the above is the case than you might insert some 'breakpoints' into your code/iteration, ie. at some defined points, issue gc and wait till it terminates.
Run your program with -XX:+HeapDumpOnOutOfMemoryError. You'll then be able to use a memory analyser like MAT to see what is using up all of the memory - it may be something completely unexpected.
So I'm using Java to do multi-way external merge sorts of large on-disk files of line-delimited tuples. Batches of tuples are read into a TreeSet, which are then dumped into on-disk sorted batches. Once all of the data have been exhausted, these batches are then merge-sorted to the output.
Currently I'm using magic numbers for figuring out how many tuples we can fit into memory. This is based on a static figure indicating how may tuples can be roughly fit per MB of heap space, and how much heap space is available using:
long max = Runtime.getRuntime().maxMemory();
long used = Runtime.getRuntime().totalMemory();
long free = Runtime.getRuntime().freeMemory();
long space = free + (max - used);
However, this does not always work so well since we may be sorting different length tuples (for which the static tuple-per-MB figure might be too conservative) and I now want to use flyweight patterns to jam more in there, which may make the figure even more variable.
So I'm looking for a better way to fill the heap-space to the brim. Ideally the solution should be:
reliable (no risk of heap-space exceptions)
flexible (not based on static numbers)
efficient (e.g., not polling runtime memory estimates after every tuple)
Any ideas?
Filling the heap to the brim might be a bad idea due to garbage collector trashing. (As the memory gets nearly full, the efficiency of garbage collection approaches 0, because the effort for collection depends on heap size, but the amount of memory freed depends on the size of the objects identified as unreachable).
However, if you must, can't you simply do it as follows?
for (;;) {
long freeSpace = getFreeSpace();
if (freeSpace < 1000000) break;
for (;;freeSpace > 0) {
treeSet.add(readRecord());
freeSpace -= MAX_RECORD_SIZE;
}
}
The calls to discover the free memory will be rare, so shouldn't tax performance much. For instance, if you have 1 GB heap space, and leave 1MB empty, and MAX_RECORD_SIZE is ten times average record size, getFreeSpace() will be invoked a mere log(1000) / -log(0.9) ~= 66 times.
Why bother with calculating how many items you can hold? How about letting java tell you when you've used up all your memory, catching the exception and continuing. For example,
// prepare output medium now so we don't need to worry about having enough
// memory once the treeset has been filled.
BufferedWriter writer = new BufferedWriter(new FileWriter("output"));
Set<?> set = new TreeSet<?>();
int linesRead = 0;
{
BufferedReader reader = new BufferedReader(new FileReader("input"));
try {
String line = reader.readLine();
while (reader != null) {
set.add(parseTuple(line));
linesRead += 1;
line = reader.readLine();
}
// end of file reached
linesRead = -1;
} catch (OutOfMemoryError e) {
// while loop broken
} finally {
reader.close();
}
// since reader and line were declared in a block their resources will
// now be released
}
// output treeset to file
for (Object o: set) {
writer.write(o.toString());
}
writer.close();
// use linesRead to find position in file for next pass
// or continue on to next file, depending on value of linesRead
If you still have trouble with memory, just make the reader's buffer extra large so as to reserve more memory.
The default size for the buffer in a BufferedReader is 4096 bytes. So when finishing reading you will release upwards of 4k of memory. After this your additional memory needs will be minimal. You need enough memory to create an iterator for the set, let's be generous and assume 200 bytes. You will also need memory to store the string output of your tuples (but only temporarily). You say the tuples contain about 200 characters. Let's double that to take account for separators -- 400 characters, which is 800 bytes. So all you really need is an additional 1k bytes. So you're fine as you've just released 4k bytes.
The reason you don't need to worry about the memory used to store the string output of your tuples is because they are short lived and only referred to within the output for loop. Note that the Writer will copy the contents into its buffer and then discard the string. Thus, the next time the garbage collector runs the memory can be reclaimed.
I've checked and, a OOME in add will not leave a TreeSet in an inconsistent state, and the memory allocation for a new Entry (the internal implementation for storing a key/value pair) happens before the internal representation is modified.
You can really fill the heap to the brim using direct memory writing (it does exist in Java!). It's in sun.misc.Unsafe, but isn't really recommended for use. See here for more details. I'd probably advise writing some JNI code instead, and using existing C++ algorithms.
I'll add this as an idea I was playing around with, involving using a SoftReference as a "sniffer" for low memory.
SoftReference<Byte[]> sniffer = new SoftReference<String>(new Byte[8192]);
while(iter.hasNext()){
tuple = iter.next();
treeset.add(tuple);
if(sniffer.get()==null){
dump(treeset);
treeset.clear();
sniffer = new SoftReference<String>(new Byte[8192]);
}
}
This might work well in theory, but I don't know the exact behaviour of SoftReference.
All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError. Otherwise no constraints are placed upon the time at which a soft reference will be cleared or the order in which a set of such references to different objects will be cleared. Virtual machine implementations are, however, encouraged to bias against clearing recently-created or recently-used soft references.
Would like to hear feedback as it seems to me like an elegant solution, although behaviour might vary between VMs?
Testing on my laptop, I found that it the soft-reference is cleared infrequently, but sometimes is cleared too early, so I'm thinking to combine it with meriton's answer:
SoftReference<Byte[]> sniffer = new SoftReference<String>(new Byte[8192]);
while(iter.hasNext()){
tuple = iter.next();
treeset.add(tuple);
if(sniffer.get()==null){
free = MemoryManager.estimateFreeSpace();
if(free < MIN_SAFE_MEMORY){
dump(treeset);
treeset.clear();
sniffer = new SoftReference<String>(new Byte[8192]);
}
}
}
Again, thoughts welcome!