java String in following code leak memery? how to fix it - java

Here is the function:
String gen() {
String raw_content = "";
String line;
for (int i=0; i<10000; i++) {
line = randString();
raw_content += line + "\n";
}
return raw_content;
}
When I call gen() for 100 times in main(), my program will stuck.
I suspect this is related to memory leak caused by Java String. So will the no-longer used memory be freed by JVM automatically? How to fix this?
Thanks!

To make a long story short, in java (and other JVM languages), you don't have to care about memory allocation at all. You really shouldn't be worrying about it - at some time after all references to it have been lost, it'll be freed up by the garbage collecting thread. See: Garbage Collection in Java.
Your problem has less to do with memory and more that your function is just really time intensive (as Hot Licks said in comment). Strings in Java are immutable, so when you say raw_content += line + "\n"; you're really creating a new string of raw_content + line + "\n" and setting raw_content equal to that. If rand_string() returns long results, this will become an egregiously long string. If you really want to perform this function, StringBuilders are the way to go to at least reduce it from O(N^2) to O(N). If you're just looking for a memory exercise, you don't have to actually do any changes - just read the above article.

Related

Best practice to convert a Double to a String

I am currently using
Double a = 0.00;
for(condition)
//Do things
String result = "" + a;
Would using
String result = a.toString();
Provide any real benefit compared to what I have now. Does this just help the compiler or are there any differences between the two methods?
The first version - String result = "" + a under the hood is the same as String result = "" + a.toString();. Whenever there is a concatenation of String + Object the toString method is called.
What is the best practice here? What looks better for you. I'd probably go with the first version.
If you're concerned about the performance of both - String result = a.toString(); on paper will be faster because you don't need to create / get an empty String just to create a new one. However, as with many things in Java, something like that most likely gets optimized by JIT compiler anyway so I wouldn't worry about it too much. Even if it doesn't you shouldn't worry about optimization prematurely - if your code runs slowly then usually there is something else wrong with it that is much bigger than that.
I think second option is better because concatenation of strings cost much more memory.Since Strings are immutable objects in the first way your memory is wasting for store a Double object + two String Objects .
But in the second option it only create one new String object only .So in your memory there will only be one Double object + one String Object.

Why is string concatenation faster in Delphi than in Java?

Problem
I wrote 2 programs, one in Delphi and one in Java, for string concatenation and I noticed a much faster string concatenation in Delphi compared to Java.
Java
String str = new String();
long t0 = System.nanoTime();
for (int i = 0; i < 50000; i++)
str += "abc";
long t1 = System.nanoTime();
System.out.println("String + String needed " + (t1 - t0) / 1000000 + "ms");
Delphi
Stopwatch.Start;
for i := 1 to 50000 do
str := str + 'abc';
Stopwatch.Stop;
ShowMessage('Time in ms: ' + IntToStr(Stopwatch.ElapsedMilliseconds));
Question
Both measure the time in milliseconds but the Delphi program is much faster with 1ms vs. Javas 2 seconds. Why is string concatenation so much faster in Delphi?
Edit: Looking back at this question with more experience I should have come to the conclusion that the main difference comes from Delphi being compiled and Java being compiled and then run in the JVM.
TLDR
There may be other factors, but certainly a big contributor is likely to be Delphi's default memory manager. It's designed to be a little wasteful of space in order to reduce how often memory is reallocated.
Considering memory manager overhead
When you have a straight-forward memory manager (you might even call it 'naive'), your loop concatenating strings would actually be more like:
//pseudo-code
for I := 1 to 50000 do
begin
if CanReallocInPlace(Str) then
//Great when True; but this might not always be possible.
ReallocMem(Str, Length(Str) + Length(Txt))
else
begin
AllocMem(NewStr, Length(Str) + Length(Txt))
Copy(Str, NewStr, Length(Str))
FreeMem(Str)
end;
Copy(Txt, NewStr[Length(NewStr)-Length(Txt)], Length(Txt))
end;
Notice that on every iteration you increase the allocation. And if you're unlucky, you very often have to:
Allocate memory in a new location
Copy the existing 'string so far'
Finally release the old string
Delphi (and FastMM)
However, Delphi has switched from the default memory manager used in it's early days to a previously 3rd party one (FastMM) that's designed run faster primarily by:
(1) Using a sub-allocator i.e. getting memory from the OS a 'large' page at a time.
Then performing allocations from the page until it runs out.
And only then getting another page from the OS.
(2) Aggressively allocating more memory than requested (anticipating small growth).
Then it becomes more likely the a slightly larger request can be reallocated in-place.
These techniques can thought it's not guaranteed increase performance.
But it definitely does waste space. (And with unlucky fragmentation, the wastage can be quite severe.)
Conclusion
Certainly the simple app you wrote to demonstrate the performance greatly benefits from the new memory manager. You run through a loop that incrementally reallocates the string on every iteration. Hopefully with as many in-place allocations as possible.
You could attempt to circumvent some of FastMM's performance improvements by forcing additional allocations in the loop. (Though sub-allocation of pages would still be in effect.)
So simplest would be to try an older Delphi compiler (such as D5) to demonstrate the point.
FWIW: String Builders
You said you "don't want to use the String Builder". However, I'd like to point out that a string builder obtains similar benefits. Specifically (if implemented as intended): a string builder wouldn't need to reallocate the substrings all the time. When it comes time to finally build the string; the correct amount of memory can be allocated in a single step, and all portions of the 'built string' copied to where they belong.
In Java (and C#) strings are immutable objects. That means that if you have:
string s = "String 1";
then the compiler allocates memory for this string. Haven then
s = s + " String 2"
gives us "String 1 String 2" as expected but because of the immutability of the strings, a new string was allocated, with the exactly size to contain "String 1 String 2" and the content of both strings is copied to the new location. Then the original strings are deleted by the garbage collector. In Delphi a string is more "copy-on-write" and reference counted, which is much faster.
C# and Java have the class StringBuilder with behaves a lot like Delphi strings and are quite faster when modifying and manipulating strings.

Repeatedly updating string in Java

So I understand that strings in Java are immutable. I'm interested in how to repeatedly update a certain string.
Ex:
public static void main(String[] args) {
String myString = "hey";
for (int i = 1; i <= 9; i++) {
myString += "hey";
}
}
Now, this won't work in Java because I've already declared and assigned myString. How do people get around Java's immutable strings (as in the above example)?
The only thing I can think to do is declare another string. Unfortunately, this just delays my problem, as the second time through the loop, I'll be reassigning an already assigned string:
public static void main(String[] args) {
String myString = "hey";
String secondString;
for (int i = 1; i <= 10; i++) {
secondString += "hey";
}
}
Any suggestions / explanations are much appreciated!
Thanks,
Mariogs
A Quick Answer
You should use a StringBuilder for this sort of thing. It is designed to put Strings together without constantly copying or holding on to older Strings.
public class SimpleGrowingString {
private StringBuilder stringBuilder;
public SimpleGrowingString() {
this.stringBuilder = new Stringbuilder();
}
public void addToString(String str) {
this.stringBuilder.append(str);
}
public String getString() {
return this.stringBuilder.toString();
}
}
A Not So Quick Answer:
Immutable?
While Strings are immutable, you can re-assign a String variable.
The variable will then reference (point to) the new String assigned to it and the old value will be marked for Garbage Collection and only hang about in RAM until the Garbage Collector gets off its arse and around to clearing it out. That is, as long as there are no other references to it (or subsections of it) still about somewhere.
Immutable means that you cannot change a String itself not that you cannot reassign what the variable that was pointing to its value now is.
eg.
String str = "string one";
The String "string one" exists in memory and can not be changed, modified, cut up, added to etc.
It is immutable.
If I then say:
str = "a different string";
Then the variable str now references a different piece of data in memory; the String "a different string".
The original String "string one" is still the exact same String that it was before we've just told the handle we had for it to point to something else. The old String is still floating around in memory but now it's headless and we no longer have any way to actually access that value.
This leads to the idea of Garbage Collection.
Garbage. Garbage Everywhere.
The Garbage Collector runs every now and again and cleans out old, unnecessary data that's no longer being used.
It decides what is and isn't useful, among other ways, by checking if there are any valid handles/variables currently pointing at the data. If there's nothing using it and there's no way for us to even access it anymore it's useless to us and it gets dumped.
But you can't really ever rely on the Garbage Collector to clean out thing on time, quickly or even get it to run when you want it to. It does its own thing, in its own time. You are better off trying to minimise its workload than assuming it's going to clean up after you all the time.
And now that you have, an admittedly very basic, grounding in Garbage Collection we can talk about why you don't add Strings together:
String con+cat+en+a+tion
One big issue with using + for Strings (and the reason that StringBuilder and StringBuffer were designed) is that it creates an awful lot of Strings. Strings all over the place! They may end up as candidates for Garbage Collection relatively quickly depending on your usage, but they still can lead to bloat if handled incorrectly, especially when loops get involved, as the Garbage Collector runs whenever it damn well feels like and we can't really control that we can't say that things are not getting out of hand unless we ourselves stop them getting that way.
Doing the simple String concatenation of:
"a String" + " another String"
actually leads to having three Strings in memory:
"a String", " another String" and "a String another String"
Expand this out to a simple, fairly short loop and you can see how things can get out of hand pretty quickly:
String str = "";
for (int i=0; i<=6; i++) {
str += "a chunk of RAM ";
}
Which at each loops means we have in memory:
0:
"a chunk of RAM "
1:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM"
2:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM"
"a chunk of RAM a chunk of RAM a chunk of RAM"
3:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
4:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM"
5:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM"
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
6:
"a chunk of RAM "
"a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM"
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
"a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM a chunk of RAM "
And so on and so on... You can see where that's going and how quickly it's getting there.
Moral of the story
If you are looping with and combining Strings use a StringBuilder or StringBuffer it's what they were made for.
"Sidenote".substring(4);
Concatenating Strings isn't the only way to end up with a lot of wasted RAM because of String's immutability.
Just as we can't add to the end of a String neither can we chop off the tail.
String moderatelyLongString = "a somewhat lengthy String that rambles on and on about nothing in particular for quite a while even though nobody's listening";
If perhaps we wanted to make use of only a part of this String we could use the String.substring() method:
String snippedString = moderatelyLongString.substring(0, 13);
System.out.println(snippedString):
>> a somewhat le
Ok, so what's wrong with that?
Well, if we wanted to dump the first, long String but hang onto the short bit you might think that we can just say:
moderatelyLongString = null;
You may think that will leave the long String abandoned and alone, crying in a corner waiting for the GC to come and kick it out into the cold but you'd be wrong.
Since we've still got a hold of a couple of characters of the longer chain in snippedString the entire moderatelyLongString stays on the heap wasting space.
If you wanted to do this but shed the useless weight what you would want to do is copy the shortened part but not retain a tie to the lengthy bit:
String aNicerMorePoliteShortString = new String(moderatelyLongString.substring(0, 13));
This makes a copy of the short String taken from the long that is its own stand alone array of characters and has nothing to do with that pestering hanger-on that is the long String.
Now doing this will, this time, mark the long String as available for Collection as we have no remaining ties to it:
moderatelyLongString = null;
However
If you just wanted to display a single, different String in a loop on every iteration what you want to do is just (re)use a single variable so that all of the older Strings in memory get released as soon as possible and become available for Garbage Collection as quickly as they can be. Declare your variable outside of the loop and then reassign it inside on every iteration:
String whatYouWantToUse;
for (int i=0; i<100; i++) {
whatYouWantToUse = someStringyGettyMethod();
howYouWantToUseIt(whatYouWantToUse);
}
Each time this loop loops it is assigning a new value to the variable which throws the older value onto the pile of waste for the Garbage Collector to clean up in time, or, you know, whenever it could be bothered to...
Arguably, a better way to do the above method is to never try to hold onto the String at all — just pass it straight though from where we get it from to where it's wanted:
for (int i=0; i<100; i++) {
howYouWantToUseIt(someStringyGettyMethod());
}
But watch out for over optimising this sort of thing as readability is almost always more important than compactness.
Most compilers are smarter than we'll ever be, or than I will be at least. They can find all the great shortcuts and minifications that can be done to your code and apply their wizardry in a more magnificent way than we mortals could hope to achieve.
If you try to streamline your code too much then all you're left with is two varieties of unreadable code instead of one useful, fast and optimised version and the other maintainable and something Johnny with the off-putting habit of sniffling every 25 seconds two desks over can follow.
this won't work in Java because I've already declared and assigned myString
You are wrong, it will still work but each time you append to the string it will generate a new string.
If you dont want to generate new string when you append/add to it then StringBuilder is the solution.
sample:
public static void main(String args[]) {
StringBuilder sb = new StringBuilder("hey");
for (int i = 1; i <= 9; i++) {
sb.append("hey");
}
}
Being immutable doesnt mean that it wont work, it just means that the object you created wont be modified.. but the String variable can be assigned to another object (the new string created by concatenating the previous strings on += "hey").
If you want to do it like a mutable object, then just use StringBuilder append() method.
While Strings in Java are immutable, your first example above will work because it creates a new String object every time through the loop, and assigns the newly created string to myString:
public static void main(String[] args) {
String myString = "hey";
for (int i = 1; i <= 9; i++) {
myString += "hey";
}
System.out.println(myString); // prints "heyheyheyheyheyheyheyheyheyhey"
}
While this works, it's inefficient due to the object creation. For a loop with more iterations, or when concatenating longer strings, performance might be a concern – so there are better ways to do it.
One better approach to concatenating Strings in Java is to use StringBuilder. Here's your code adapted to use StringBuilder:
public static void main(String[] args) {
StringBuilder builder = new StringBuilder(50); // estimated length
for (int i = 1; i <= 9; i++) {
builder.append("hey");
}
String myString = builder.toString(); // convert to String when needed
System.out.println(myString); // prints "heyheyheyhey..."
}
StringBuilder has a backing array as a buffer, which is expanded whenever the appended length exceeds the size of the buffer. In this case, we start with an initial allocation of 50 characters.
In a real world situation, you could set the initial size of the StringBuilder buffer based on the size of the input, to minimise the need for expensive buffer expansions.
When you execute the working code in your question, you will simply create a new string in memory each time you append to it.
This means that every time you append something to your string it will be a new string object in memory, implying that it also has a new memory address.
This is because strings are immutable indeed.
If you only want to create a string object once, you should use a StringBuilder, otherwise this solution works fine.
StringBuilders are recommended for building string that you will - as you do - modify a lot. Because modifying a string a lot (i.e., creating many new strings) does a lot of reading and writing in your memory.
Your code works perfectly fine. Although its not recommended to work on strings like you do.
Have a look at Java's StringBuilder: http://docs.oracle.com/javase/7/docs/api/java/lang/StringBuffer.html
With the aid of a StringBuilder, you can modify the string.

Java outOfMemory exception in string.split

I have a big txt file with integers in it. Each line in file has two integer numbers separated by whitespace. Size of a file is 63 Mb.
Pattern p = Pattern.compile("\\s");
try (BufferedReader reader = new BufferedReader(new FileReader(filePath))) {
String line;
while ((line = reader.readLine()) != null) {
String[] tokens = p.split(line);
String s1 = new String(tokens[0]);
String s2 = new String(tokens[1]);
int startLabel = Integer.valueOf(s1) - 1;
int endLabel = Integer.valueOf(s2) - 1;
Vertex fromV = vertices.get(startLabel);
Vertex toV = vertices.get(endLabel);
Edge edge = new Edge(fromV, toV);
fromV.addEdge(edge);
toV.addEdge(edge);
edges.add(edge);
System.out.println("Edge from " + fromV.getLabel() + " to " + toV.getLabel());
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at java.lang.String.substring(String.java:1913)
at java.lang.String.subSequence(String.java:1946)
at java.util.regex.Pattern.split(Pattern.java:1202)
at java.util.regex.Pattern.split(Pattern.java:1259)
at SCC.main(SCC.java:25)
Why am I getting this exception? How can I change my code to avoid it?
EDIT:
I've already increase heap size to 2048m.
What is consuming it? That's what I would want to know also.
For all I know jvm should allocate memory to list of vertices, set of edges, buffer for buffered reader and one small string "line". I don't see where this outOfMemory coming from.
I read about string.split() method. I think it's causing memory leak, but I don't know what should I do about it.
What you should try first is reduce the file to small enough that it works. That will allow you to appraise just how large a problem you have.
Second, your problem is definitely unrelated to String#split since you are using it on just one line at a time. What is consuming your heap are the Vertex and Edge instances. You'll have to redesign this towards a smaller footprint, or completely overhaul your algorithms to be able to work with only a part of the graph in memory, the rest on the disk.
P.S. Just a general Java note: don't write
String s1 = new String(tokens[0]);
String s2 = new String(tokens[1]);
you just need
String s1 = tokens[0];
String s2 = tokens[1];
or even just use tokens[0] directly instead of s1, since it's about as clear.
Easiest way: increase your heap size:
Add -Xmx512m -Xms512m (or even more) arguments to jvm
Increase the heap memory limit, using the -Xmx JVM option.
More info here.
You are getting this exception because your program is storing too much data in the java heap.
Although your exception is showing up in the Pattern.split() method, the actual culprit could be any large memory user in your code, such as the graph you are building. Looking at what you provided, I suspect the graph data structure is storing much redundant data. You may want to research a more space-efficient graph structure.
If you are using the Sun JVM, try the JVM option -XX:+HeapDumpOnOutOfMemoryError to create a heap dump and analyze that for any heavy memory users, and use that analysis to optimize your code. See Using HeapDumpOnOutOfMemoryError parameter for heap dump for JBoss for more info.
If that's too much work for you, as others have indicated, try increasing the JVM heap space to a point where your program no longer crashes.
When ever you get an OOM while trying to parse stuff, its just that the method you are using is not scalable. Even though increasing the heap might solve the issue temporarily, it is not scalable. Example, if tomorrow your file size increases by an order or magnitude, you would be back in square one.
I would recommend trying to read the file in pieces, cache x lines of the file, read off it, clear the cache and re-do the process.
You can use either ehcache or guava cache.
The way you parse the string could be changed.
try (Scanner scanner = new Scanner(new FileReader(filePath))) {
while (scanner.hasNextInt()) {
int startLabel = scanner.nextInt();
int endLabel = scanner.nextInt();
scanner.nextLine(); // discard the rest of the line.
// use start and end.
}
I suspect the memory consumption is actually in the data structure you build rather than how you read the data, but this should make it more obvious.

Measure memory usage of a certain datastructure

I'm trying to measure the memory usage of my own datastructure in my Tomcat Java EE application at various levels of usage.
To measure the memory usage I have tried two strategies:
Runtime freeMemory and totalMemory:
System.gc(); //about 20 times
long start = Runtime.getRuntime().freeMemory();
useMyDataStructure();
long end = Runtime.getRuntime().freeMemory();
System.out.println(start - end);
MemoryPoolMXBean.getPeakUsage():
long before = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
List<MemoryPoolMXBean> memorymxbeans = ManagementFactory.getMemoryPoolMXBeans();
for(MemoryPoolMXBean memorybean: memorymxbeans){
memorybean.resetPeakUsage();
}
useMyDataStructure();
for(MemoryPoolMXBean memorybean: memorymxbeans){
MemoryUsage peak = memorybean.getPeakUsage();
System.out.println(memorybean.getName() + ": " + (peak.getUsed() - before));
}
Method 1 does not output reliable data at all. The data is useless.
Method 2 outputs negative values. Besides it's getName() tells me it's outputting Code Cache, PS Eden Space, PS Survivor Space and PS Old Gen seperately.
How can I acquire somewhat consistent memory usage numbers before and after my useMyDataStructure() call in Java? I do not wish to use VirtualVM, I prefer to catch the number in a long object and write it to file myself.
Thanks in advance.
edit 1:
useMyDatastructure in the above examples was an attempt to simplify the code. What's really there:
int key = generateKey();
MyOwnObject obj = makeAnObject();
MyContainerClass.getSingleton().addToHashMap(key, obj);
So in essence I'm really trying to measure how much memory the HashMap<Integer, MyOwnObject> in MyContainerClass takes. I will use this memory measurement to perform an experiment where I fill up both the HashMap and MyOwnObject instances.
1st of all sizing objects in java is non-trivial (as explained very well here).
if you wish to know the size of a particular object, there are at least 2 open source libraries that will do the math for you - java.sizeof and javabi-sizeof
now, as for your specific test - System.gc() is mostly ignored by modern (hotspot) jvms, no matter how many times you call it. also, is it possible your useMyDataStructure() method does not retain a reference to the object(s) it creates? in that case measuring free memory after calling it is no good as any allocated Objects might have been cleared out.
You could try https://github.com/jbellis/jamm, this works great for me.

Categories