frequent garbage collection java web app

frequent garbage collection java web app - java

I have a web app that serializes a java bean into xml or json according to the user request.
I am facing a mind bending problem when I put a little bit of load on it, it quickly uses all allocated memory, and reach max capacity. I then observe full GC working really hard every 20-40 seconds.
Doesnt look like a memory leak issue... but I am not quite sure how to trouble shoot this?
The bean that is serialized to xml/json has reference to other beans and those to others. I use json-lib and jaxb to serialize the beans.
yourkit memory profiler is telling me that a char[] is the most memory consuming live object...
any insight is appreciated.

There are two possibilities: you've got a memory leak, or your webapp is just generating lots of garbage.
The brute-force way to tell if you've got a memory leak is to run it for a long time and see if it falls over with an OOME. Or turn on GC logging, and see if the average space left after garbage collection continually trends upwards over time.
Whether or not you have a memory leak, you can probably improve performance (reduce the percentage GC time) by increasing the max heap size. The fact that your webapp is seeing lots of full GCs suggests to me that it needs more heap. (This is just a bandaid solution if you have a memory leak.)
If it turns out that you are not suffering from a memory leak, then you should take a look at why your application is generating so much garbage. It could be down to the way that you are doing the XML and JSON serialization.

Why do you think you have a problem? GC is a natural and normal thing to happen. We have customers that GC every second (for less than 100ms duration), and that's fine as long as memory keeps getting reclaimed.
GCing every 20-40 seconds isn't a problem IMO - as long as it doesn't take a large % of that 20-40s. Most major commercial JVMs aim to keep GC in the 5-10% of time range (so 1-4 seconds of that 20-40s). Posting more data in the form of the GC logs might help, and I'd also suggest tools like GCMV would help you visualize and get recommendations on what your GC profile looks like.

It's impossible to diagnose this without a lot more information - code and GC logs - but my guess would be that you're reading data in as large strings, then chopping out little bits with substring(). When you do that, the substring string is made using the same underlying character array as the parent string, and so as long as it's alive, will keep that array in memory. That means code like this:
String big = a string of one million characters;
String small = big.substring(0, 1);
big = null;
Will still keep the huge string's character data in memory. If this is the case, then you can address it by forcing the small strings to use fresh, smaller, character arrays by constructing new instances:
small = new String(small);
But like i said, this is just a guess.

I'm not sure how much of it is in your code and how much might be in the tools you are using, but there are some key things to watch for.
One of the worst is if you constantly add to strings in loops. A simple "hello"+"world" is no problem at all, it's actually very smart about that, but if you do it in a loop it will constantly reallocate the string. Use StringBuilder where you can.
There are profilers for Java that should quickly point you to where the allocations are taking place. Just fool around with a profiler for a while while your java app is running and you will probably be able to reduce your GCs to virtually nothing unless the problem is inside your libraries--and even then you may figure out some way to fix it.
Things you allocate and then free quickly don't require time in the GC phase--it's pretty much free. Be sure you aren't keeping Strings around longer than you need them. Bring them in, process them and return to your previous state before returning from your request handler.

You should attach yourkit and record allocations (e.g., every 10th allocation; including all large ones). They have a step by step guide on diagnosing excessive gc:
http://www.yourkit.com/docs/90/help/excessive_gc.jsp

To me that sounds like you are trying to serialize a recursive object by some encoder which is not prepared for it.
(or at least: very deep/almost recursive)

Java's native XML API is really "noisy" and generally wasteful in terms of resources which means that if your requests and XML/JSON generation cycles are short-lived, the GC will have lots to clean up for.
I have debugged a very similar case and found out this the hard way, only way I could at least somewhat improve the situation without major refactorings was implicitly calling GC with the appropriate VM flags which actually turn System.gc(); from a non-op call to maybe-op call.

I would start by inspecting my running application to see what was being created on the heap.
HPROF can collect this information for you, which you can then analyse using HAT.

To debug issues with memory allocations, InMemProfiler can be used at the command line. Collected object allocations can be tracked and collected objects can be split into buckets based on their lifetimes.
In trace mode this tool can be used to identify the source of memory allocations.

Related

Best practice for creating millions of small temporary objects

What are the "best practices" for creating (and releasing) millions of small objects?
I am writing a chess program in Java and the search algorithm generates a single "Move" object for each possible move, and a nominal search can easily generate over a million move objects per second. The JVM GC has been able to handle the load on my development system, but I'm interested in exploring alternative approaches that would:
Minimize the overhead of garbage collection, and
reduce the peak memory footprint for lower-end systems.
A vast majority of the objects are very short-lived, but about 1% of the moves generated are persisted and returned as the persisted value, so any pooling or caching technique would have to provide the ability to exclude specific objects from being re-used.
I don't expect fully-fleshed out example code, but I would appreciate suggestions for further reading/research, or open source examples of a similar nature.

Run the application with verbose garbage collection:
java -verbose:gc
And it will tell you when it collects. There would be two types of sweeps, a fast and a full sweep.
[GC 325407K->83000K(776768K), 0.2300771 secs]
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K(776768K), 1.8479984 secs]
The arrow is before and after size.
As long as it is just doing GC and not a full GC you are home safe. The regular GC is a copy collector in the 'young generation', so objects that are no longer referenced are simply just forgotten about, which is exactly what you would want.
Reading Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning is probably helpful.

Since version 6, the server mode of JVM employs an escape analysis technique. Using it you can avoid GC all together.

Well, there are several questions in one here !
1 - How are short-lived objects managed ?
As previously stated, the JVM can perfectly deal with a huge amount of short lived object, since it follows the Weak Generational Hypothesis.
Note that we are speaking of objects that reached the main memory (heap). This is not always the case. A lot of objects you create does not even leave a CPU register. For instance, consider this for-loop
for(int i=0, i<max, i++) {
// stuff that implies i
}
Let's not think about loop unrolling (an optimisations that the JVM heavily performs on your code). If max is equal to Integer.MAX_VALUE, you loop might take some time to execute. However, the i variable will never escape the loop-block. Therefore the JVM will put that variable in a CPU register, regularly increment it but will never send it back to the main memory.
So, creating millions of objects are not a big deal if they are used only locally. They will be dead before being stored in Eden, so the GC won't even notice them.
2 - Is it useful to reduce the overhead of the GC ?
As usual, it depends.
First, you should enable GC logging to have a clear view about what is going on. You can enable it with -Xloggc:gc.log -XX:+PrintGCDetails.
If your application is spending a lot of time in a GC cycle, then, yes, tune the GC, otherwise, it might not be really worth it.
For instance, if you have a young GC every 100ms that takes 10ms, you spend 10% of your time in the GC, and you have 10 collections per second (which is huuuuuge). In such a case, I would not spend any time in GC tuning, since those 10 GC/s would still be there.
3 - Some experience
I had a similar problem on an application that was creating a huge amount of a given class. In the GC logs, I noticed that the creation rate of the application was around 3 GB/s, which is way too much (come on... 3 gigabytes of data every second ?!).
The problem : Too many frequent GC caused by too many objects being created.
In my case, I attached a memory profiler and noticed that a class represented a huge percentage of all my objects. I tracked down the instantiations to find out that this class was basically a pair of booleans wrapped in an object. In that case, two solutions were available :
Rework the algorithm so that I do not return a pair of booleans but instead I have two methods that return each boolean separately
Cache the objects, knowing that there were only 4 different instances
I chose the second one, as it had the least impact on the application and was easy to introduce. It took me minutes to put a factory with a not-thread-safe cache (I did not need thread safety since I would eventually have only 4 different instances).
The allocation rate went down to 1 GB/s, and so did the frequency of young GC (divided by 3).
Hope that helps !

If you have just value objects (that is, no references to other objects) and really but I mean really tons and tons of them, you can use direct ByteBuffers with native byte ordering [the latter is important] and you need some few hundred lines of code to allocate/reuse + getter/setters. Getters look similar to long getQuantity(int tupleIndex){return buffer.getLong(tupleInex+QUANTITY_OFFSSET);}
That would solve the GC problem almost entirely as long as you do allocate once only, that is, a huge chunk and then manage the objects yourself. Instead of references you'd have only index (that is, int) into the ByteBuffer that has to be passed along. You may need to do the memory align yourself as well.
The technique would feel like using C and void*, but with some wrapping it's bearable. A performance downside could be bounds checking if the compiler fails to eliminate it. A major upside is the locality if you process the tuples like vectors, the lack of the object header reduces the memory footprint as well.
Other than that, it's likely you'd not need such an approach as the young generation of virtually all JVM dies trivially and the allocation cost is just a pointer bump. Allocation cost can be a bit higher if you use final fields as they require memory fence on some platforms (namely ARM/Power), on x86 it is free, though.

Assuming you find GC is an issue (as others point out it might not be) you will be implementing your own memory management for you special case i.e. a class which suffers massive churn. Give object pooling a go, I've seen cases where it works quite well. Implementing object pools is a well trodden path so no need to re-visit here, look out for:
multi-threading: using thread local pools might work for your case
backing data structure: consider using ArrayDeque as it performs well on remove and has no allocation overhead
limit the size of your pool :)
Measure before/after etc,etc

I've met a similar problem. First of all, try to reduce the size of the small objects. We introduced some default field values referencing them in each object instance.
For example, MouseEvent has a reference to Point class. We cached Points and referenced them instead of creating new instances. The same for, for example, empty strings.
Another source was multiple booleans which were replaced with one int and for each boolean we use just one byte of the int.

I dealt with this scenario with some XML processing code some time ago. I found myself creating millions of XML tag objects which were very small (usually just a string) and extremely short-lived (failure of an XPath check meant no-match so discard).
I did some serious testing and came to the conclusion that I could only achieve about a 7% improvement on speed using a list of discarded tags instead of making new ones. However, once implemented I found that the free queue needed a mechanism added to prune it if it got too big - this completely nullified my optimisation so I switched it to an option.
In summary - probably not worth it - but I'm glad to see you are thinking about it, it shows you care.

Given that you are writing a chess program there are some special techniques you can use for decent performance. One simple approach is to create a large array of longs (or bytes) and treat it as a stack. Each time your move generator creates moves it pushes a couple of numbers onto the stack, e.g. move from square and move to square. As you evaluate the search tree you will be popping off moves and updating a board representation.
If you want expressive power use objects. If you want speed (in this case) go native.

One solution I've used for such search algorithms is to create just one Move object, mutate it with new move, and then undo the move before leaving the scope. You are probably analyzing just one move at a time, and then just storing the best move somewhere.
If that's not feasible for some reason, and you want to decrease peak memory usage, a good article about memory efficiency is here: http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

Just create your millions of objects and write your code in the proper way: don't keep unnecessary references to these objects. GC will do the dirty job for you. You can play around with verbose GC as mentioned to see if they are really GC'd. Java IS about creating and releasing objects. :)

I think you should read about stack allocation in Java and escape analysis.
Because if you go deeper into this topic you may find that your objects are not even allocated on the heap, and they are not collected by GC the way that objects on the heap are.
There is a wikipedia explanation of escape analysis, with example of how this works in Java:
http://en.wikipedia.org/wiki/Escape_analysis

I am not a big fan of GC, so I always try finding ways around it. In this case I would suggest using Object Pool pattern:
The idea is to avoid creating new objects by store them in a stack so you can reuse it later.
Class MyPool
{
LinkedList<Objects> stack;
Object getObject(); // takes from stack, if it's empty creates new one
Object returnObject(); // adds to stack
}

Object pools provide tremendous (sometime 10x) improvements over object allocation on the heap. But the above implementation using a linked list is both naive and wrong! The linked list creates objects to manage its internal structure nullifying the effort.
A Ringbuffer using an array of objects work well. In the example give (a chess programm managing moves) the Ringbuffer should be wrapped into a holder object for the list of all computed moves. Only the moves holder object references would then be passed around.

How to memory profile in Java?

I'm still learning the ropes of Java so sorry if there's a obvious answer to this. I have a program that is taking a ton of memory and I want to figure a way to reduce its usage, but after reading many SO questions I have the idea that I need to prove where the problem is before I start optimizing it.
So here's what I did, I added a break point to the start of my program and ran it, then I started visualVM and had it profile the memory(I also did the same thing in netbeans just to compare the results and they are the same). My problem is I don't know how to read them, I got the highest area just saying char[] and I can't see any code or anything(which makes sense because visualvm is connecting to the jvm and can't see my source, but netbeans also does not show me the source as it does when doing cpu profiling).
Basically what I want to know is which variable(and hopefully more details like in which method) all the memory is being used so I can focus on working there. Is there a easy way to do this? I right now I am using eclipse and java to develop(and installed visualVM and netbeans specifically for profiling but am willing to install anything else that you feel gets this job done).
EDIT: Ideally, I'm looking for something that will take all my objects and sort them by size(so I can see which one is hogging memory). Currently it returns generic information such as string[] or int[] but I want to know which object its referring to so I can work on getting its size more optimized.

Strings are problematic
Basically in Java, String references ( things that use char[] behind the scenes ) will dominate most business applications memory wise. How they are created determines how much memory they consume in the JVM.
Just because they are so fundamental to most business applications as a data type, and they are one of the most memory hungry as well. This isn't just a Java thing, String data types take up lots of memory in pretty much every language and run time library, because at the least they are just arrays of 1 byte per character or at the worse ( Unicode ) they are arrays of multiple bytes per character.
Once when profiling CPU usage on a web app that also had an Oracle JDBC dependency I discovered that StringBuffer.append() dominated the CPU cycles by many orders of magnitude over all other method calls combined, much less any other single method call. The JDBC driver did lots and lots of String manipulation, kind of the trade off of using PreparedStatements for everything.
What you are concerned about you can't control, not directly anyway
What you should focus on is what in in your control, which is making sure you don't hold on to references longer than you need to, and that you are not duplicating things unnecessarily. The garbage collection routines in Java are highly optimized, and if you learn how their algorithms work, you can make sure your program behaves in the optimal way for those algorithms to work.
Java Heap Memory isn't like manually managed memory in other languages, those rules don't apply
What are considered memory leaks in other languages aren't the same thing/root cause as in Java with its garbage collection system.
Most likely in Java memory isn't consumed by one single uber-object that is leaking ( dangling reference in other environments ).
It is most likely lots of smaller allocations because of StringBuffer/StringBuilder objects not sized appropriately on first instantantations and then having to automatically grow the char[] arrays to hold subsequent append() calls.
These intermediate objects may be held around longer than expected by the garbage collector because of the scope they are in and lots of other things that can vary at run time.
EXAMPLE: the garbage collector may decide that there are candidates, but because it considers that there is plenty of memory still to be had that it might be too expensive time wise to flush them out at that point in time, and it will wait until memory pressure gets higher.
The garbage collector is really good now, but it isn't magic, if you are doing degenerate things, it will cause it to not work optimally. There is lots of documentation on the internet about the garbage collector settings for all the versions of the JVMs.
These un-referenced objects may just have not reached the time that the garbage collector thinks it needs them to for them to be expunged from memory, or there could be references to them held by some other object ( List ) for example that you don't realize still points to that object. This is what is most commonly referred to as a leak in Java, which is a reference leak more specifically.
EXAMPLE: If you know you need to build a 4K String using a StringBuilder create it with new StringBuilder(4096); not the default, which is like 32 and will immediately start creating garbage that can represent many times what you think the object should be size wise.
You can discover how many of what types of objects are instantiated with VisualVM, this will tell you what you need to know. There isn't going to be one big flashing light that points at a single instance of a single class that says, "This is the big memory consumer!", that is unless there is only one instance of some char[] that you are reading some massive file into, and this is not possible either, because lots of other classes use char[] internally; and then you pretty much knew that already.
I don't see any mention of OutOfMemoryError
You probably don't have a problem in your code, the garbage collection system just might not be getting put under enough pressure to kick in and deallocate objects that you think it should be cleaning up. What you think is a problem probably isn't, not unless your program is crashing with OutOfMemoryError. This isn't C, C++, Objective-C, or any other manual memory management language / runtime. You don't get to decide what is in memory or not at the detail level you are expecting you should be able to.

In JProfiler, you can take go to the heap walker and activate the biggest objects view. You will see the objects the retain most memory. "Retained" memory is the memory that would be freed by the garbage collector if you removed the object.
You can then open the object nodes to see the reference tree of the retained objects. Here's a screen shot of the biggest object view:
Disclaimer: My company develops JProfiler

I would recommend capturing heap dumps and using a tool like Eclipse MAT that lets you analyze them. There are many tutorials available. It provides a view of the dominator tree to provide insight into the relationships between the objects on the heap. Specifically for what you mentioned, the "path to GC roots" feature of MAT will tell you where the majority of those char[], String[] and int[] objects are being referenced. JVisualVM can also be useful in identifying leaks and allocations, particularly by using snapshots with allocation stack traces. There are quite a few walk-throughs of the process of getting the snapshots and comparing them to find the allocation point.

Java JDK comes with JVisualVM under bin folder, once your application server (for example is running) you can run visualvm and connect it to your localhost, which will provide you memory allocation and enable you to perform heap dump

If you use visualVM to check your memory usage, it focuses on the data, not the methods. Maybe your big char[] data is caused by many String values? Unless you are using recursion, the data will not be from local variables. So you can focus on the methods that insert elements into large data structures. To find out what precise statements cause your "memory leakage", I suggest you additionally
read Josh Bloch's Effective Java Item 6: (Eliminate obsolete object references)
use a logging framework an log instance creations on the highest verbosity level.

There are generally two distinct approaches to analyse Java code to gain an understanding of its memory allocation profile. If you're trying to measure the impact of a specific, small section of code – say you want to compare two alternative implementations in order to decide which one gives better runtime performance – you would use a microbenchmarking tool such as JMH.
While you can pause the running program, the JVM is a sophisticated runtime that performs a variety of housekeeping tasks and it's really hard to get a "point in time" snapshot and an accurate reading of the "level of memory usage". It might allocate/free memory at a rate that does not directly reflect the behaviour of the running Java program. Similarly, performing a Java object heap dump does not fully capture the low-level machine specific memory layout that dictates the actual memory footprint, as this could depend on the machine architecture, JVM version, and other runtime factors.
Tools like JMH get around this by repeatedly running a small section of code, and observing a long-running average of memory allocations across a number of invocations. E.g. in the GC profiling sample JMH benchmark the derived *·gc.alloc.rate.norm metric gives a reasonably accurate per-invocation normalised memory cost.
In the more general case, you can attach a profiler to a running application and get JVM-level metrics, or perform a heap dump for offline analysis. Some commonly used tools for profiling full applications are Async Profiler and the newly open-sourced Java Flight Recorder in conjunction with Java Mission Control to visualise results.

Is Java Native Memory Faster than the heap?

I'm exploring options to help my memory-intensive application, and in doing so I came across Terracotta's BigMemory. From what I gather, they take advantage of non-garbage-collected, off-heap "native memory," and apparently this is about 10x slower than heap-storage due to serialization/deserialization issues. Prior to reading about BigMemory, I'd never heard of "native memory" outside of normal JNI. Although BigMemory is an interesting option that warrants further consideration, I'm intrigued by what could be accomplished with native memory if the serialization issue could be bypassed.
Is Java native memory faster (I think this entails ByteBuffer objects?) than traditional heap memory when there are no serialization issues (for instance if I am comparing it with a huge byte[])? Or do the vagaries of garbage collection, etc. render this question unanswerable? I know "measure it" is a common answer around here, but I'm afraid I would not set up a representative test as I don't yet know enough about how native memory works in Java.

Direct memory is faster when performing IO because it avoid one copy of the data. However, for 95% of application you won't notice the difference.
You can store data in direct memory, however it won't be faster than storing data POJOs. (or as safe or readable or maintainable) If you are worried about GC, try creating your objects (have to be mutable) in advance and reuse them without discarding them. If you don't discard your objects, there is nothing to collect.
Is Java native memory faster (I think this entails ByteBuffer objects?) than traditional heap memory when there are no serialization issues (for instance if I am comparing it with a huge byte[])?
Direct memory can be faster than using a byte[] if you use use non bytes like int as it can read/write the whole four bytes without turning the data into bytes.
However it is slower than using POJOs as it has to bounds check every access.
Or do the vagaries of garbage collection, etc. render this question unanswerable?
The speed has nothing to do with the GC. The GC only matters when creating or discard objects.
BTW: If you minimise the number of object you discard and increase your Eden size, you can prevent even minor collection occurring for a long time e.g. a whole day.

The point of BigMemory is not that native memory is faster, but rather, it's to reduce the overhead of the garbage collector having to go through the effort of tracking down references to memory and cleaning it up. As your heap size increases, so do your GC intervals and CPU commitment. Depending upon the situation, this can create a sort of "glass ceiling" where the Java heap gets so big that the GC turns into a hog, taking up huge amounts of processor power each time the GC kicks in. Also, many GC algorithms require some level of locking that means nobody can do anything until that portion of the GC reference tracking algorithm finishes, though many JVM's have gotten much better at handling this. Where I work, with our app server and JVM's, we found that the "glass ceiling" is about 1.5 GB. If we try to configure the heap larger than that, the GC routine starts eating up more than 50% of total CPU time, so it's a very real cost. We've determined this through various forms of GC analysis provided by our JVM vendor.
BigMemory, on the other hand, takes a more manual approach to memory management. It reduces the overhead and sort of takes us back to having to do our own memory cleanup, as we did in C, albeit in a much simpler approach akin to a HashMap. This essentially eliminates the need for a traditional garbage collection routine, and as a result, we eliminate that overhead. I believe that the Terracotta folks used native memory via a ByteBuffer as it's an easy way to get out from under the Java garbage collector.
The following whitepaper has some good info on how they architected BigMemory and some background on the overhead of the GC: http://www.terracotta.org/resources/whitepapers/bigmemory-whitepaper.

I'm intrigued by what could be accomplished with native memory if the serialization issue could be bypassed.
I think that your question is predicated on a false assumption. AFAIK, it is impossible to bypass the serialization issue that they are talking about here. The only thing you could do would be to simplify the objects that you put into BigMemory and use custom serialization / deserialization code to reduce the overheads.
While benchmarks might give you a rough idea of the overheads, the actual overheads will be very application specific. My advice would be:
Only go down this route if you know you need to. (You will be tying your application to a particular implementation technology.)
Be prepared for some intrusive changes to your application if the data involved isn't already managed using as a cache.
Be prepared to spend some time in (re-)tuning your caching code to get good performance with BigMemory.
If your data structures are complicated, expect a proportionately larger runtime overheads and tuning effort.

Java Profiling, Performance Tuning and Memory Profiling exercises

I am about to conduct a workshop profiling, performance tuning, memory profiling, memory leak detection etc. of java applications using JProfiler and Eclipse Tptp.
I need a set of exercises that I could offer to participants where they can:
Use the tool to to profile the discover the problem: bottleneck, memory leak, suboptimal code etc. I am sure there is plenty experience and real-life examples around.
Resolve the problem and implement optimized code
Demonstrate the solution by performing another session of profiling
Ideally, write the unit test that demonstrates the performance gain
Problems nor solutions should not be overly complicated; it should be possible to resolve them in matter of minutes at best and matter of hours at worst.
Some interesting areas to exercise:
Resolve memory leaks
Optimize loops
Optimize object creation and management
Optimize string operations
Resolve problems exacerbated by concurrency and concurrency bottlenecks
Ideally, exercises should include sample unoptimized code and the solution code.

I try to find real life examples that I've seen in the wild (maybe slightly altered, but the basic problems were all very real). I've also tried to cluster them around the same scenario, so you can build up a session easily.
Scenario: you have a time consuming function that you want to do many times for different values, but the same values may pop up again (ideally not too long after it was created). A good and simple example is url-web page pairs that you need to download and process (for the exercise it should be probably simulated).
Loops:
You want to check if any of a set of words pops up in the pages. Use your function in a loop, but with the same value, pseudo code:
for (word : words) {
checkWord(download(url))
}
One solution is quite easy, just download the page before the loop.
Other solution is below.
Memory leak:
simple one: you can also solve your problem with a kind of cache. In the simplest case you can just put the results to a (static) map. But if you don't prevent it, its size will grow infinitely -> memory leak.
Possible solution: use an LRU map. Most likely performance will not degrade too much, but the memory leak should go away.
trickier one: say you implement the previous cache using a WeakHashMap, where the keys are the URLs (NOT as strings, see later), values are instances of a class that contain the URL, the downloaded page and something else. You may assume that it should be fine, but in fact it is not: as the value (which is not weakly referenced) has a reference to the key (the URL) the key will never be eligible to clean up -> nice memory leak.
Solution: remove the URL from the value.
Same as before, but the urls are interned strings ("to save some memory if we happen to have the same strings again"), value does not refer to this. I did not try it, but it seems to me that it would also cause a leak, because interned Strings can not be GC-ed.
Solution: do not intern, which will also lead to the advice that you must not skip: don't do premature optimization, as it is the root of all evil.
Object creation & Strings:
say you want to display the text of the pages only (~remove html tags). Write a function that does it line by line, and appends it to a growing result. At first the result should be a string, so appending will take a lot of time and object allocation. You can detect this problem from performance point of view (why appends are so slow) and from object creation point of view (why we created so many Strings, StringBuffers, arrays, etc).
Solution: use a StringBuilder for the result.
Concurrency:
You want to speed the whole stuff up by doing downloading/filtering in parallel. Create some threads and run your code using them, but do everything inside a big synchronized block (based on the cache), just "to protect the cache from concurrency problems". Effect should be that you effectively use just one thread, as all the others are waiting to acquire the lock on the cache.
Solution: synchronize only around cache operations (e.g. use `java.util.collections.synchronizedMap())
Synchronize all tiny little pieces of code. This should kill performance, probably prevent normal parallel execution. If you are lucky/smart enough you can come up with a dead lock also.
Moral of this: synchronization should not be an ad hoc thing, on an "it will not hurt" basis, but a well thought thing.
Bonus exercise:
Fill up your cache at the beginning and don't do too much allocation afterward, but still have a small leak somewhere. Usually this pattern is not too easy to catch. You can use a "bookmark", or "watermark" feature of the profiler, which should be created right after the caching is done.

Don't ignore this method because it works very well for any language and OS, for these reasons. An example is here. Also, try to use examples with I/O and significant call depth. Don't just use little cpu-bound programs like Mandelbrot. If you take that C example, which isn't too large, and recode it in Java, that should illustrate most of your points.
Let's see:
Resolve memory leaks.
The whole point of a garbage collector is to plug memory leaks. However, you can still allocate too much memory, and that shows up as a large percent of time in "new" for some objects.
Optimize loops.
Generally loops don't need to be optimized unless there's very little done inside them (and they take a good percent of time).
Optimize object creation and management.
The basic approach here is: keep data structure as simple as humanly possible. Especially stay away from notification-style attempts to keep data consistent, because those things run away and make the call tree enormously bushy. This is a major reason for performance problems in big software.
Optimize string operations.
Use string builder, but don't sweat code that doesn't use a solid percent of execution time.
Concurrency.
Concurrency has two purposes.
1) Performance, but this only works to the extent that it allows multiple pieces of hardware to get cranking at the same time. If the hardware isn't there, it doesn't help. It hurts.
2) Clarity of expression, so for example UI code doesn't have to worry about heavy calculation or network I/O going on at the same time.
In any case, it can't be emphasized enough, don't do any optimization before you've proved that something takes a significant percent of time.

I have used JProfiler for profiling our application.But it hasn't been of much help.Then I used JHat.Using JHat you cannot see the heap in real time.You have to take a heap dump and then analyse it. Using the OQL(Object Query Language) is a good technique to find heap leaks.

sun.java2d.loops.ProcessPath$Point

I am profiling an application suddenly using a lot of memory, and i am getting this:
sun.java2d.loops.ProcessPath$Point
As being allocated almost 11.000.000 times.
What is it, and is there a solution to this?

My initial response would be to question whether this is actually using a lot of memory/CPU cycles? The sun. packages are internal implementations of Sun's JVM, so they're likely to be low-level details of what your code is doing. If these objects are taking up a vast amount of memory that might be an issue, but simply seeing 11 million allocations is no indication that anything is out of the ordinary.
Edit: a little Googling seems to show that this is an object used to encode a reference to a particular point on a 2D plane. Chances are that if you're doing anything that involves graphics then yes, you'd have a large amount of them generated. Additionally, each one only stores two integers (x and y coordinates) and a boolean, so they are going to be very small objects in the grand scheme of things. Even if if none of those 11 million allocations were garbage collected (and I expect the majority will have been local variables so will have been quickly collected), then they're not going to account for a large part of the heap unless you're running on devices with tiny amounts of RAM.
In other words, look elsewhere for your problem. It would probably be more helpful to look at objects that are taking up a large amount of the current heap space, or even look at the number of objects currently referenced, in order to find your leak. Read documents giving guidelines on how to find and quash memory leaks with your tool(s) of choice. Looking at total allocations is rarely that useful, unless you know for a given class how many there should be (e.g. it can be good to check that singletons are only created once, for example).

I solved the memory problem. I was doing some nasty reference handling some places in my code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.