I am a dummy in profiling, please tell me what you people do to profile your application. Which one is the better, Profiling the whole application or make an isolation? If the choice is make an isolation how you do that?
As far as possible, profile the entire application, running a real (typical) workload. Anything else and you risk getting results that lead you to focus your optimization efforts in the wrong place.
EDIT
Isn't that too hard to get a correct result when profiling the whole application? so the test result is depends on the user interaction (button clicking etc) and not using automatic task? Tell me if I'm wrong.
Getting the "correct result" depends on how you interpret the profiling data. For instance, if you are profiling an interactive application, you should figure out which parts of the profile correspond to waiting for user interaction, and ignore them.
There are a number of problems with profiling your application in parts. For example:
By deciding beforehand which parts of the application to profile, you don't get a good picture of the relative contribution of the different parts, and you risk wasting effort on the wrong parts.
You pretty much have to use artificial workloads. Whenever you do that there is a risk that the workloads are not representative of "normal" workloads, and your profiling results are biased.
In many applications, the bottlenecks are due to the way that the parts of the application interact with each other, or with I/O or garbage collection. Profiling different parts of the application separately is likely to miss these interactions.
... what i am looking for is the technique
Roughly speaking, you start with the biggest "hotspots" identified by the profile data and drill down until you've figured out why the so much is being spent in a certain area. It really helps if your profiling tool can aggregate and present the data top down and bottom up.
But, at the end of the day going from the profiling evidence (hotspots, stack snapshots, etc) to the root cause and the remedy is often down to the practical knowledge and intuition that comes from experience.
(Yea ... I'm waffling a bit. But my point is that there is no magic formula for doing this. Ultimately, you've got to use your brain ... like you have to when debugging a complex application.)
First I just time it with a watch to get an overall measurement.
Then I run it under a debugger and take stackshots. What these do is tell me which lines of code are responsible for large fractions of time. In particular, this means lines where functions are called without really needing to be, and I/O that I may not have been aware of.
Since it shows me lines of code that take time and can be done a better way, I fix those.
Then I start over at the top and see how much time I actually saved. I repeat these steps until I can no longer find things that a) take significant % of time, and b) I can fix.
This has been called "poor man's profiling". The little secret is not only is it cheap, but it is very effective, because it avoids the common myths about profiling.
P.S. If it is an interactive application, do all this just to the part of it that is slow, like if you press a "Do Useful Stuff" button, and it finishes a few seconds later. There's no point to taking stackshots when it's waiting for YOU.
P.P.S. Suppose there is some activity that should be faster, but finishes too quickly to take stackshots, like if it takes a second but should take a fraction of a second. Then what you can do is (temporarily) wrap a for loop around it, of 10 or 100 iterations. That will make it take long enough to get samples. After you've speeded it up, remove the loop.
Take a look http://www.ej-technologies.com/products/jprofiler/overview.html
Related
I'm aware that, for good practice, StringBuilder should be initialised with a capacity value of the expected content. Otherwise, increasing the size after compilation is going to be an expensive operation.
My question is, if we don't know the expected size, how should one go about it? Is there a standard value/way to avoid expensive operations under the hood?
If not, is there potentially a way of alarming/logging in the code if the capacity is bigger than the value given upon initialisation?
I'm aware that, for good practice, StringBuilder should be initialised with a capacity value of the expected content. Otherwise, increasing the size after compilation is going to be an expensive operation.
This is a wildly incorrect statement. It is very bad practice to do this. Even if you know exactly how large it'll be.
If I see this code:
StringBuilder sb = new StringBuilder(in1.length() + in2.length() * 3 + loaded ? suffixLen : 0);
Then this is an additional thing to worry about, test, and keep up to date. I would assume if all this is present that for whatever reason somebody did some performance testing and actually figured out that this saves a worthwhile chunk of cycles, and somehow, in a fit of idiocy, neglected to write an enlightening comment and link to the JMH or profiler result analysis to verify this conclusion.
So, I'd either painstakingly attempt to manually analyse precisely if the calculation is still correct after an update to this code, or, I'd fix the problem and add the tests (and then be utterly befuddled, when, of course, the profile review shows this code is utterly inconsequential), or, I'd go through the considerable trouble of writing an assert based test case that will run the entire operation and then verify at the end that the size calculation done at the top is, in fact, correct.
I don't think you fully grasp why the hyperbolic premature optimization is the root of all evil statement is so popular.
Here's the problem. 99% of the system's resources are spent on 1% of the code. That's not an exaggeration; in fact, that is likely understating the issue.
Developer time is not infinite, and even if it was, the programmer's ability to comprehend code and focus on the relevant parts, is limited because, in the end, they are humans. Spending additional code that needs to be parsed and understood by human eyeballs and brains is therefore bad if the code does something irrelevant. We're literally talking about the same order of magnitude as you throwing a glass of water into the ocean down by the seashore in europe and then watching the water levels rise in manhattan. Beyond any and all ability to measure, and utterly incomprehensive. Bold does not do sufficient justice to how little it matters. Even if this code runs 100 million times a day for 15 years, it amounts to perhaps 5 cents in IAAS deployment costs total over that entire decade and a half, and that's if there is even a performance impact, which often there isn't because modern VMs, GCs, OSes, and CPU architectures get up to some crazy shenanigans.
Furthermore, the system optimizes. Optimizers, such as JVM hotspot engines, are in the end pattern matching machines. They find commonly used patterns and recognize how to run them as efficient as possible. By writing code in ways that nobody else does, it is highly unlikely that code is going to actually outperform the common (idiomatic) case. Most likely because it just doesn't matter, and even if it does, because the idiomatic case gets optimized much more readily.
Here is a trivial example:
List<String> someList = new ArrayList<String>();
for (int i = 0; i < 10000; i++) someList.add(someRandomString);
String[] arrayForm = someList.toArray(new String[0]);
Here you may go: Huh, well, we can optimize this code a little bit and pass new String[10000] instead; this saves the system from having to allocate an admittedly small object (a 0-size string array).
You would be wrong. The above code, with the new String[0], is in fact faster. How can that be? Optimizers, with pattern matching. They recognize the pattern and realize that the system can create a new array of the requisite size, not zero it out, and then run the code that fills it. Whereas the optimization patterns do not include the new String[reqSize] variant where the system could in theory also realize it can allocate the array and then omit zeroing out the array (which the JVM spec guarantees, which merely means the spec guarantees you can never observe that it wasn't zeroed out; it doesn't actually mean the JVM must zero it out, that's where the pattern optimization of not doing so is coming from). However, it doesn't do that - not common enough, and somewhat more complicated.
I'm not saying that new StringBuilder() is neccessarilyt faster than new StringBuilder(knownSize). I'm saying it:
99.9% of the time literally does not make one iota of difference. Not a single nanosecond - the speedup is entirely theoretical: No performance test of any stripe can detect the difference. If a tree falls in the forest, and all that.
You have no idea when that 0.1% of the time even is, or if it's not actually straight up never - 0%. Between a CPU that caches and pipelines (did you know modern CPUs cannot access memory? At all? I bet you didn't. The basic von neumann model of how CPUs work? Totally misleads you if you try to performance analyse machine code if you do that) - VMs, garbage collectors (did you know that garbage is free but live objects are expensive? Re-using an object is in fact more expensive that creating a ton of fast garbage.. depending on many factors, of course this too is an oversimplification. That's the real point: This is an intractable thing; you cannot just look at code and jump to conclusions about performance) - you stand no chance to know what's 'faster'.
The only right move is to write code as simple and as clean as you can ('clean' defined as: When you look at it, you jump to conclusions, and these conclusions are correct. It is easy to adjust in the face of changing requirements, and flexible in how it connects to the rest of the codebase). IF (big if!) real life situations result in a performance issue, you first run a profiler so you know the 1% of the code that is in any way relevant, and then you go ham on that, with JMH benchmarks and all sorts of performance experiments to optimize the heck out of it. If your code is clean, that's great, because almost always this requires adjusting how the code that calls into the 'hot path' or where the code flows to out of the 'hot path' - and the cleaner your code the easier that will be.
Needless performance optimization almost invariably reduces flexibility, and makes code harder to understand.
Hence, objectively, micro-optimizing like this just makes your code slower and buggier for literally no benefit. Not even a tiny, almost immeasurable one.
Hence, the advice is silly. The only correct call is new StringBuilder() - no pre-configured size. The one and only excuse you have to write new StringBuilder(presetCapacity) is if there's a lengthy comment that immediately precedes it that lays out in a lot of detail, or links to a ticket, the exact performance study done to indicate this indeed fixes a real performance issue and how to recreate that study, and on what schedule it should be revisited.
Assume I have a loop (any while or for) like this:
loop{
A long code.
}
From the point of time complexity, should I divide this code in parts, write a function outside the loop, and call that function repeatedly?
I read something about functions very long ago, that calling a function repeatedly takes more time or memory or like something, I don't remember it exactly. Can you also provide some good reference about things like this (time complexity, coding style)?
Can you also provide some reference book or tutorial about heap memory, overheads etc. which affects the performance of program?
The performance difference is probably very minimal in this case. I would concentrate on clarity rather than performance until you identify this portion of your code to be a serious bottleneck.
It really does depend on what kind of code you're running in the loop, however. If you're just doing a tiny mathematical operation that isn't going to take any CPU time, but you're doing it a few hundred thousand times, then inlining the calculation might make sense. Anything more expensive than that, though, and performance shouldn't be an issue.
There is an overhead of calling a function.
So if the "long code" is fast compared to this overhead (and your application cares about performance), then you should definitely avoid the overhead.
However, if the performance is not noticably worse, it's better to make it more readable, by using a (or better multiple) function.
Rule one of performance optmisation: Measure it.
Personally, I go for readable code first and then optimise it IF NECESSARY. Usually, it isn't necessary :-)
See the first line in CHAPTER 3 - Measurement Is Everything
"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil." - Donald Knuth
In this case, the difference in performance will probably be minimal between the two solutions, so writing clearer code is the way to do it.
There really isnt a simple "tutorial" on performance, it is a very complex subject and one that even seasoned veterans often dont fully understand. Anyway, to give you more of an idea of what the overhead of "calling" a function is, basically what you are doing is "freezing" the state of your function(in Java there are no "functions" per se, they are all called methods), calling the method, then "unfreezing", where your method was before.
The "freezing" essentially consists of pushing state information(where you were in the method, what the value of the variables was etc) on to the stack, "unfreezing" consists of popping the saved state off the stack and updating the control structures to where they were before you called the function. Naturally memory operations are far from free, but the VM is pretty good at keeping the performance impact to an absolute minimum.
Now keep in mind Java is almost entirely heap based, the only things that really have to get pushed on the stack are the value of pointers(small), your place in the program(again small), and whatever primitives you have local to your method, and a tiny bit of control information, nothing else. Furthermore, although you cannot explicitly inline in Java(though Im sure there are bytecode editors out there that essentially let you do that), most VMs, including the most popular HotSpot VM, will do this automatically for you. http://java.sun.com/developer/technicalArticles/Networking/HotSpot/inlining.html
So the bottom line is pretty much 0 performance impact, if you want to verify for yourself you can always run benchmarking and profiling tools, they should be able to confirm it for you.
From a execution speed point of view it shouldn't matter, and if you still believe this is a bottleneck it is easy to measure.
From a development performance perspective, it is a good idea to keep the code short. I would vote for turning the loop contents into one (or more) properly named methods.
Forget it! You can't gain any performance by doing the job of the JIT. Let JIT inline it for you. Keep the methods short for readability and also for performance, as JIT works better with short methods.
There are microptimizations which may help you gain some performance, but don't even think about them. I suggest the following rules:
Write clean code using appropriate objects and algorithms for readability and for performance.
In case the program is too slow, profile and identify the critical parts.
Think about improving them using better objects and algorithms.
As a last resort, you may also consider microoptimizations.
Just curious to know if there are a list of steps that I can use as guidelines to debug performance issues to pinpoint what is taking the most time. There are a myriad of tools starting with logging, timing methods, load test tools, timing database queries and so on....
considering that there are so many different things, is there a list of things that are at the top of the list.
if so please let me
Check the machine physically has enough RAM. Paging will kill applications.
Check what proportion of the application's time is spent in garbage collection. A high proportion means you'll benefit from heap or GC tuning.
Run the app in a profiler and put it through its paces, monitoring CPU usage. Look for those methods where the CPU spends all its time. A good profiler will let you roll up the time spent in third party code where you have no control, allowing you to identify the hot spots in your own code.
For the top hot spots in your application, work out where the time is being spent. Is it all I/O? Calculations? Multi-thread synchronisation? Object allocation/deallocation? A poor algorithm with n-squared or worse complexity? Something else?
Deal with the hot spots one at a time, and each time you change something, measure and work out whether you've actually improved anything. Roll back the ineffective changes and work out where the problem has moved to if the change worked.
There is nothing really specific to Java about something like this, with any language/framework/tool you should follow the same pattern:
Measure the performance before you change a single thing
Hypothesize about possible causes/fixes
Implement the change
Measure performance after the change to compare with #1
Repeat until happy
Measure
MEASURE!!!!!
Compare Apples to Apples. Don't run your tests on a busy subnet(especially don't try to justify this ludicrous practice by saying - "I want the circumstances to be realistic")
Measure- Capture time stamps at each discrete step.
Note that although there is a relationship, throughput and response time are not the same thing
After you make a change... MEASURE!!!!! Never say to yourself, it seems better. You know how you know its better? compare measurement 1 to measurement 2
Test one thing at a time. Don't create one uber performance suite that attempts to simulate realistic conditions. Its too much and you are setting yourself up to be overwhelmed. Test for message size. Test for concurrency. Test in isolation
Once you start to isolate the bottlenecks then the next steps will start to feel more natural, fine tuning your tests will become easier, you may choose at that point to hook up a profiler to investigate GC/CPU perf and memory consumption(VisualVM is good and free).The point is treat performance issues like a binary search. Start by measuring everything and continually subdivide the problem to it reveals itself.
The first and most important step in any kind of perfomance tuning is to identify what is slow, and measure just how slow it is. In most cases (particularly if the performance problem is easy to reproduce), a profiler is the most effective tool for that, as it will give you detailed statistics on execution time, breaking it down to single methods, without having to manually instrument your program.
Check DB queries
Check Statements in loop, statement in loops make application slow instead use prepared statement/callable statements
Capture time stamps at each discrete step
Identify hot spots area where time is being spent, like I/O, Calculations,
multithreaded synchronization, garbage collection and look for Poor algorithms.
After watching the presentation "Performance Anxiety" of Joshua Bloch, I read the paper he suggested in the presentation "Evaluating the Accuracy of Java Profilers". Quoting the conclusion:
Our results are disturbing because they indicate that profiler incorrectness is pervasive—occurring in most of our seven benchmarks and in two production JVM—-and significant—all four of
the state-of-the-art profilers produce incorrect profiles. Incorrect
profiles can easily cause a performance analyst to spend time optimizing cold methods that will have minimal effect on performance.
We show that a proof-of-concept profiler that does not use yield
points for sampling does not suffer from the above problems
The conclusion of the paper is that we cannot really believe the result of profilers. But then, what is the alternative of using profilers. Should we go back and just use our feeling to do optimization?
UPDATE: A point that seems to be missed in the discussion is observer effect. Can we build a profiler that really 'observer effect'-free?
Oh, man, where to begin?
First, I'm amazed that this is news. Second, the problem is not that profilers are bad, it is that some profilers are bad.
The authors built one that, they feel, is good, just by avoiding some of the mistakes they found in the ones they evaluated.
Mistakes are common because of some persistent myths about performance profiling.
But let's be positive.
If one wants to find opportunities for speedup, it is really very simple:
Sampling should be uncorrelated with the state of the program.
That means happening at a truly random time, regardless of whether the program is in I/O (except for user input), or in GC, or in a tight CPU loop, or whatever.
Sampling should read the function call stack,
so as to determine which statements were "active" at the time of the sample.
The reason is that every call site (point at which a function is called) has a percentage cost equal to the fraction of time it is on the stack.
(Note: the paper is concerned entirely with self-time, ignoring the massive impact of avoidable function calls in large software. In fact, the reason behind the original gprof was to help find those calls.)
Reporting should show percent by line (not by function).
If a "hot" function is identified, one still has to hunt inside it for the "hot" lines of code accounting for the time. That information is in the samples! Why hide it?
An almost universal mistake (that the paper shares) is to be concerned too much with accuracy of measurement, and not enough with accuracy of location.
For example, here is an example of performance tuning
in which a series of performance problems were identified and fixed, resulting in a compounded speedup of 43 times.
It was not essential to know precisely the size of each problem before fixing it, but to know its location.
A phenomenon of performance tuning is that fixing one problem, by reducing the time, magnifies the percentages of remaining problems, so they are easier to find.
As long as any problem is found and fixed, progress is made toward the goal of finding and fixing all the problems.
It is not essential to fix them in decreasing size order, but it is essential to pinpoint them.
On the subject of statistical accuracy of measurement, if a call point is on the stack some percent of time F (like 20%), and N (like 100) random-time samples are taken, then the number of samples that show the call point is a binomial distribution, with mean = NF = 20, standard deviation = sqrt(NF(1-F)) = sqrt(16) = 4. So the percent of samples that show it will be 20% +/- 4%.
So is that accurate? Not really, but has the problem been found? Precisely.
In fact, the larger a problem is, in terms of percent, the fewer samples are needed to locate it. For example, if 3 samples are taken, and a call point shows up on 2 of them, it is highly likely to be very costly.
(Specifically, it follows a beta distribution. If you generate 4 uniform 0,1 random numbers, and sort them, the distribution of the 3rd one is the distribution of cost for that call point.
It's mean is (2+1)/(3+2) = 0.6, so that is the expected savings, given those samples.)
INSERTED: And the speedup factor you get is governed by another distribution, BetaPrime, and its average is 4. So if you take 3 samples, see a problem on 2 of them, and eliminate that problem, on average you will make the program four times faster.
It's high time we programmers blew the cobwebs out of our heads on the subject of profiling.
Disclaimer - the paper failed to reference my article: Dunlavey, “Performance tuning with instruction-level cost derived from call-stack sampling”, ACM SIGPLAN Notices 42, 8 (August, 2007), pp. 4-8.
If I read it correctly, the paper only talks about sample-based profiling. Many profilers also do instrumentation-based profiling. It's much slower and has some other problems, but it should not suffer from the biases the paper talks about.
The conclusion of the paper is that we
cannot really believe the result of
profilers. But then, what is the
alternative of using profilers.
No. The conclusion of the paper is that current profilers' measuring methods have specific defects. They propose a fix. The paper is quite recent. I'd expect profilers to implement this fix eventually. Until then, even a defective profiler is still much better than "feeling".
Unless you are building bleeding edge applications that need every CPU cycle then I have found that profilers are a good way to find the 10% slowest parts of your code. As a developer, I would argue that should be all you really care about in nearly all cases.
I have experience with http://www.dynatrace.com/en/ and I can tell you it is very good at finding the low hanging fruit.
Profilers are like any other tool and they have their quirks but I would trust them over a human any day to find the hot spots in your app to look at.
If you don't trust profilers, then you can go into paranoia mode by using aspect oriented programming, wrapping around every method in your application and then using a logger to log every method invocation.
Your application will really slow down, but at least you'll have a precise count of how many times each method is invoked. If you also want to see how long each method takes to execute, wrap around every method perf4j.
After dumping all these statistics to text files, use some tools to extract all necessary information and then visualize it. I'd guess this will give you a pretty good overview of how slow your application is in certain places.
Actually, you are better off profiling at the database level. Most enterprise databases come with the ability to show the top queries over a period of time. Start working on those queries until the top ones are down to 300 ms or less, and you will have made great progress. Profilers are useful for showing behavior of the heap and for identifying blocked threads, but I personally have never gotten much traction with the development teams on identifying hot methods or large objects.
I am about to conduct a workshop profiling, performance tuning, memory profiling, memory leak detection etc. of java applications using JProfiler and Eclipse Tptp.
I need a set of exercises that I could offer to participants where they can:
Use the tool to to profile the discover the problem: bottleneck, memory leak, suboptimal code etc. I am sure there is plenty experience and real-life examples around.
Resolve the problem and implement optimized code
Demonstrate the solution by performing another session of profiling
Ideally, write the unit test that demonstrates the performance gain
Problems nor solutions should not be overly complicated; it should be possible to resolve them in matter of minutes at best and matter of hours at worst.
Some interesting areas to exercise:
Resolve memory leaks
Optimize loops
Optimize object creation and management
Optimize string operations
Resolve problems exacerbated by concurrency and concurrency bottlenecks
Ideally, exercises should include sample unoptimized code and the solution code.
I try to find real life examples that I've seen in the wild (maybe slightly altered, but the basic problems were all very real). I've also tried to cluster them around the same scenario, so you can build up a session easily.
Scenario: you have a time consuming function that you want to do many times for different values, but the same values may pop up again (ideally not too long after it was created). A good and simple example is url-web page pairs that you need to download and process (for the exercise it should be probably simulated).
Loops:
You want to check if any of a set of words pops up in the pages. Use your function in a loop, but with the same value, pseudo code:
for (word : words) {
checkWord(download(url))
}
One solution is quite easy, just download the page before the loop.
Other solution is below.
Memory leak:
simple one: you can also solve your problem with a kind of cache. In the simplest case you can just put the results to a (static) map. But if you don't prevent it, its size will grow infinitely -> memory leak.
Possible solution: use an LRU map. Most likely performance will not degrade too much, but the memory leak should go away.
trickier one: say you implement the previous cache using a WeakHashMap, where the keys are the URLs (NOT as strings, see later), values are instances of a class that contain the URL, the downloaded page and something else. You may assume that it should be fine, but in fact it is not: as the value (which is not weakly referenced) has a reference to the key (the URL) the key will never be eligible to clean up -> nice memory leak.
Solution: remove the URL from the value.
Same as before, but the urls are interned strings ("to save some memory if we happen to have the same strings again"), value does not refer to this. I did not try it, but it seems to me that it would also cause a leak, because interned Strings can not be GC-ed.
Solution: do not intern, which will also lead to the advice that you must not skip: don't do premature optimization, as it is the root of all evil.
Object creation & Strings:
say you want to display the text of the pages only (~remove html tags). Write a function that does it line by line, and appends it to a growing result. At first the result should be a string, so appending will take a lot of time and object allocation. You can detect this problem from performance point of view (why appends are so slow) and from object creation point of view (why we created so many Strings, StringBuffers, arrays, etc).
Solution: use a StringBuilder for the result.
Concurrency:
You want to speed the whole stuff up by doing downloading/filtering in parallel. Create some threads and run your code using them, but do everything inside a big synchronized block (based on the cache), just "to protect the cache from concurrency problems". Effect should be that you effectively use just one thread, as all the others are waiting to acquire the lock on the cache.
Solution: synchronize only around cache operations (e.g. use `java.util.collections.synchronizedMap())
Synchronize all tiny little pieces of code. This should kill performance, probably prevent normal parallel execution. If you are lucky/smart enough you can come up with a dead lock also.
Moral of this: synchronization should not be an ad hoc thing, on an "it will not hurt" basis, but a well thought thing.
Bonus exercise:
Fill up your cache at the beginning and don't do too much allocation afterward, but still have a small leak somewhere. Usually this pattern is not too easy to catch. You can use a "bookmark", or "watermark" feature of the profiler, which should be created right after the caching is done.
Don't ignore this method because it works very well for any language and OS, for these reasons. An example is here. Also, try to use examples with I/O and significant call depth. Don't just use little cpu-bound programs like Mandelbrot. If you take that C example, which isn't too large, and recode it in Java, that should illustrate most of your points.
Let's see:
Resolve memory leaks.
The whole point of a garbage collector is to plug memory leaks. However, you can still allocate too much memory, and that shows up as a large percent of time in "new" for some objects.
Optimize loops.
Generally loops don't need to be optimized unless there's very little done inside them (and they take a good percent of time).
Optimize object creation and management.
The basic approach here is: keep data structure as simple as humanly possible. Especially stay away from notification-style attempts to keep data consistent, because those things run away and make the call tree enormously bushy. This is a major reason for performance problems in big software.
Optimize string operations.
Use string builder, but don't sweat code that doesn't use a solid percent of execution time.
Concurrency.
Concurrency has two purposes.
1) Performance, but this only works to the extent that it allows multiple pieces of hardware to get cranking at the same time. If the hardware isn't there, it doesn't help. It hurts.
2) Clarity of expression, so for example UI code doesn't have to worry about heavy calculation or network I/O going on at the same time.
In any case, it can't be emphasized enough, don't do any optimization before you've proved that something takes a significant percent of time.
I have used JProfiler for profiling our application.But it hasn't been of much help.Then I used JHat.Using JHat you cannot see the heap in real time.You have to take a heap dump and then analyse it. Using the OQL(Object Query Language) is a good technique to find heap leaks.