Repeated replace calls lead to java.lang.OutOfMemoryError - java

I am mass-processing very large files. I am calling the following method on each URI in each line:
public String shortenUri(String uri) {
uri = uri
.replace("http://www.lemon-model.net/lemon#", "lemon:")
.replace("http://babelnet.org/rdf/", "bn:")
.replace("http://purl.org/dc/", "dc:")
.replace("http://www.w3.org/1999/02/22-rdf-syntax-ns#", "rdf:");
return uri;
}
Strangely, this leads to the following error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.regex.Pattern$BnM.optimize(Pattern.java:5411)
at java.util.regex.Pattern.compile(Pattern.java:1711)
at java.util.regex.Pattern.<init>(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1054)
at java.lang.String.replace(String.java:2239)
at XYZ.shortenUri(XYZ.java:217)
I did increase Xmsand Xmx but it did not help. Strangely, I could also not observe an increased memory usage when monitoring the process. Any suggestions on increasing the performance and memory consumption here?

A quote from Oracle:
Excessive GC Time and OutOfMemoryError
The parallel collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line.
The first thing you could try is to increase the heap size even more, for example, fot a few GB with -Xmx4G.
Another option might be to prevent the creation of too many objects by not using the replace method. Instead you could create the Pattern and Matcher objects as needed (see below).
The third option I see is to disable this feature altogether with -XX:-UseGCOverheadLimit
private static final Pattern PURL_PATTERN = Pattern.compile("http://purl.org/dc/");
// other patterns
public static String shortenUri(String uri) {
// other matchers
Matcher matcher = PURL_PATTERN.matcher(uri);
return matcher.replaceAll("dc:");
}

Related

Java: How to optimize the Heap allocation & Garbage collection?

I have a spring batch application, it consumes ~16GB Memory & 75% of CPU(4core X2.5Ghz) and at times it throws out of memory exception.
I want to optimize the Heap allocation & Garbage collection and tried with the following JVM options so resolve the out of memory exception.
I could not understand some of these parameters as I copy pasted directly from an article
JAVA_OPTS="-server -Xmx20480m -Xms512m -XX:+UseConcMarkSweepGC
-XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=30 -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:ParallelCMSThreads=2 -XX:+UseCMSCompactAtFullCollection -XX:+DisableExplicitGC -XX:MaxHeapFreeRatio=70 -XX:MinHeapFreeRatio=40 -XX:MaxTenuringThreshold=0 -XX:NewSize=450m -XX:MaxNewSize=650m"
would it really optimize the Heap allocation & Garbage collection and resolve the out of memory exception.?
First, you need to take a heap dump of the process when it is throwing the OOM error. You can do that by adding -XX:+HeapDumpOnOutOfMemoryError JVM option. After when you have the heap dump try using any of the following tool to analyze your heap dump. Locate which object is growing in the memory and then optimize it. Heap dump analyze tools are :
Eclipse Memory Analyzer
Heap Hero
jxray
This error is usually thrown when there is insufficient space to allocate an object on the Java heap or if the Java process is spending more than 98% of its time doing garbage collection and if it is recovering less than 2% of the heap and has been doing so far the last 5 garbage collection cycles.
I would first use a Java profiler to determine what methods are allocating large numbers of objects on the heap and make sure that they are no longer referenced after they are not needed. If this doesn't fix the issue and I have confirmed that I need all the objects, the other option would be to increase the max heap size of the program.
This could also happen when you are using too many 'String' objects or updating those strings again and again.
Strings are stored in a hashed string pool, which resides in the Heap space. When you manipulate a string, a new string is formed and stored in a different pool (hashed pools) but the original string is not deleted until the garbage collector does it.
If we use StringBuilder or StringBuffer (both are mutable, unlike strings), the space is better utilised.
Read more about strings immutability and why stringbuilder should be preferred when you need a lot of string manipulations to be performed.
StringBuilder-StringBuffer-Strings in java
Why strings are immutable in java?

Java Heap Space: Out of Memory - no garbage collection?

I have limited knowledge when it comes to the JVM and Heap Space so I'm trying to understand some behavior that we're seeing.
There's a situation where a user can request data that exceeds our heap space and therefore causes an OOM: Java Heap Space. Okay, that makes sense and we understand why that's happening.
After the OOM occurs, I notice that the amount of memory reported being used is the heap size + what our application normally runs at... however, after the OOM it doesn't appear to get garbage collected and return to normal memory levels; it just stays high. Is that expected? Apologies for such a simple question, I wasn't sure where to find this information elsewhere.
Thanks in advance.
Edit:
Here's what the code boils down to - we (mistakenly) allow a user to fetch too large of a range and when we try and return the <List> of objects it's causing the OOM error (at least that's my theory)
What in this code would cause an object to not get cleaned up?
public List<Class> getClassesByRange(Timestamp startDate, Timestamp endDate) {
final Session session = this.sessionManager.openNewSession();
try {
final Criteria criteria = session.createCriteria(Class.class);
criteria.addOrder(Order.desc("sorting"));
criteria.setFetchMode("someObject", FetchMode.JOIN);
criteria.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);
criteria.add(Restrictions.ge("createDateTime", startDate));
criteria.add(Restrictions.le("createDateTime", endDate));
return criteria.list();
} finally {
session.close();
}
}

Measure memory usage of a certain datastructure

I'm trying to measure the memory usage of my own datastructure in my Tomcat Java EE application at various levels of usage.
To measure the memory usage I have tried two strategies:
Runtime freeMemory and totalMemory:
System.gc(); //about 20 times
long start = Runtime.getRuntime().freeMemory();
useMyDataStructure();
long end = Runtime.getRuntime().freeMemory();
System.out.println(start - end);
MemoryPoolMXBean.getPeakUsage():
long before = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
List<MemoryPoolMXBean> memorymxbeans = ManagementFactory.getMemoryPoolMXBeans();
for(MemoryPoolMXBean memorybean: memorymxbeans){
memorybean.resetPeakUsage();
}
useMyDataStructure();
for(MemoryPoolMXBean memorybean: memorymxbeans){
MemoryUsage peak = memorybean.getPeakUsage();
System.out.println(memorybean.getName() + ": " + (peak.getUsed() - before));
}
Method 1 does not output reliable data at all. The data is useless.
Method 2 outputs negative values. Besides it's getName() tells me it's outputting Code Cache, PS Eden Space, PS Survivor Space and PS Old Gen seperately.
How can I acquire somewhat consistent memory usage numbers before and after my useMyDataStructure() call in Java? I do not wish to use VirtualVM, I prefer to catch the number in a long object and write it to file myself.
Thanks in advance.
edit 1:
useMyDatastructure in the above examples was an attempt to simplify the code. What's really there:
int key = generateKey();
MyOwnObject obj = makeAnObject();
MyContainerClass.getSingleton().addToHashMap(key, obj);
So in essence I'm really trying to measure how much memory the HashMap<Integer, MyOwnObject> in MyContainerClass takes. I will use this memory measurement to perform an experiment where I fill up both the HashMap and MyOwnObject instances.
1st of all sizing objects in java is non-trivial (as explained very well here).
if you wish to know the size of a particular object, there are at least 2 open source libraries that will do the math for you - java.sizeof and javabi-sizeof
now, as for your specific test - System.gc() is mostly ignored by modern (hotspot) jvms, no matter how many times you call it. also, is it possible your useMyDataStructure() method does not retain a reference to the object(s) it creates? in that case measuring free memory after calling it is no good as any allocated Objects might have been cleared out.
You could try https://github.com/jbellis/jamm, this works great for me.

java heap memory usage exception when using FOR lOOP

When i run the below program i got the exception when for loop begins its execution at i=1031521. How to over come memory usage of this type?
class wwww
{
public static void main(String args[])
{
String abc[]=new String[4194304];
String wwf="";
int s_count=524286;
for(int i=0;i<4194304;i++)
{
System.out.println("----------enter--------"+i);
abc[i]=""+i;
System.out.println("----------exit--------"+i);
}
}
}
The exception is:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.
java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390
)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at wwww.main(wwww.java:12)
This is because your your uses up all the heap space allocated to your jvm.
You can use argument while running the program to specify the heap size that you would like to allocate.
This is an example:
java -Xmx256m MyClass
Here a maximum of 256 MB of heap space will be allocated
How to over come memory usage of this type?
Don't perform memory usage of this type. You are creating 4194304 strings, of the general form ""+i. You don't need 4194304 strings of that form all at once. You only need one of them at a time, if any, and you can create it every time you need it.
You could either:
Increase the heap size that you give to your program. This is done via the -Xmx command-line argument to java.
Re-engineer the program to use less memory. Do you really need to keep all those strings in memory at once?

Does Immutability of Strings in Java cause Out Of Memory

I have written a simple Java program that reads a million rows from the Database and writes them to a File.
The max memory that this program can use is 512M.
I frequently notice that this program runs Out Of Memory for more than 500K rows.
Since the program is a very simple program it is easy to find out that this doesn't have a memory leak. the way the program works is that it fetches a thousand rows from the Database, writes them to a file using Streams and then goes and fetches the next thousand rows. The size of each row varies but none of the rows is huge. On taking a dump while the program is running the older string are easily seen on the heap. These String in heap are unreachable which means they are waiting to get Garbage collected. I also believe that the GC doesn't necessarily run during the execution of this program which leaves String's in the heap longer than they should.
I think the solution would be to use long Char Arrays(or Stringbuffer) instead of using String objects to store the lines that are returned by the DB. The assumption is that I can overwrite the contents of a Char Array which means the same Char Array can be used across multiple iterations without having to allocate new Space each time.
Pseudocode :
Create an Array of Arrays using new char[1000][1000];
Fill the thousand rows from DB to the Array.
Write Array to File.
Use the same Array for next thousand rows
If the above pseudocode fixes my problem then in reality the Immutable nature of the String class hurts the Java programmer as there is no direct way to claim the space used up by a String even though the String is no longer in use.
Are there any better alternatives to this problem ?
P.S : I didn't do a static analysis alone. I used yourkit profiler to test a heap dump. The dump clearly says 96% of the Strings have NO GC Roots which means they are waiting to get Garbage collected. Also I don't use Substring in my code.
Immutability of the class String has absolutely nothing to do with OutOfMemoryError. Immutability means that it cannot ever change, only that.
If you run out of memory, it is simply because the garbage collector was unable to find any garbage to collect.
In practice, it is likely that you are holding references to way too many Strings in memory (for instance, do you have any kind of collection holding strings, such as List, Set, Map?). You must destroy these references to allow the garbage collector to do its job and free up some memory.
The simple answer to this question is 'no'. I suspect you're hanging onto references longer than you think.
Are you closing those streams properly ? Are you intern()ing those strings. That would result in a permanent copy being made of the string if it doesn't exist already, and taking up permgen space (which isn't collected). Are you taking substring() of a larger string ? Strings make use of the flyweight pattern and will share a character array if created using substring(). See here for more details.
You suggest that garbage collection isn't running. The option -verbose:gc will log the garbage collections and you can see immediately what's going on.
The only thing about strings which can cause an OutOfMemoryError is if you retain small sections of a much larger string. If you are doing this it should be obvious from a heap dump.
When you take a heap dump I suggest you only look at live objects, in which case any retained objects you don't need is most likely to be a bug in your code.

Categories