All,
Given a code that you are not at all knowledgeable about in terms of the functionality and implementation, how would you go about finding the performance bottlenecks in that code? Please list any specific tools / standard approaches that you might be using.
I assume you have the source code, and that you can run it under a debugger, and that there is a "pause" button (or Ctrl-C, or Esc) with which you can simply stop it in its tracks.
I do that several times while it's making me wait, like 10 or 20, and each time study the call stack, and maybe some other state information, so I can give a verbal explanation of what it is doing and why.
That's the important thing - to know why it's doing what it's doing.
Typically what I see is that on, say, 20%, or 50%, or 90% of samples, it is doing something, and often that thing could be done more efficiently or not at all. So fixing that thing reduces execution time by (roughly) that percent.
The bigger a problem is, the quicker you see it.
In the limit, you can diagnose an infinite loop in 1 sample.
This gets a lot of flak from profiler-aficionados, but people who try it know it works very well. It's based on different assumptions.
If you're looking for the elephant in the room, you don't need to measure him.
Here's a more detailed explanation, and a list of common myths.
The next best thing would be a wall-time stack sampler that reports percent at the line or instruction level, such as Zoom or LTProf, but they still leave you puzzling out the why.
Good luck.
You should use a profiling too, depends on the platform:
.NET: Visual Studio performance tools, JetBrains dotTrace
Java: JProfiler
The above tools work very well for applications, but the features vary. For example, Visual Studio can summarize performance data based on tiers.
How to approach the problem is highly dependent on the type of the program, and the performance problem you're facing. But basically, you'll repeat the following cycle:
Record performance data (maybe change the settings for higher / lower granularity on recorded data)
Identify hot spots, where most of the application time is consumed
Maybe use reverse call tables to identify how the hot spot is invoked, and from where in the code
Try to refactor / optimize the hot spot
Start over, and check how much your optimization was effective.
It might take several iterations of the above cycle to get you to a point that you have acceptable performance.
Note that these tools provide many different features and ways to look at performance data, or record them. Provided that you don't have any knowledge of the internal structure of the application, you should start playing with different features and reports that the tools provide, so that you can pinpoint where to optimize.
Use differential analysis. Pick one part of the program and artificially slow it down (add a bunch of code that does nothing but waste time). Re-run your test and observe the results. Do this for a variety of aspects of your program. If adding the delays does not alter performance, then that aspect is not your bottleneck. The aspect that results in the largest perrformance hit might be the first place to look for bottlenecks.
This works even better if the severity of the delay code is adjustable while the program is running. You can increase and decrease the artificial delay and see how that affects the performance. If you encounter a test where the change in observed performance seems to follow the artificial delay linearly, then that aspect of the program might be your bottleneck.
This is just a poor man's way of doing it. The best method is probably to use a profiler. If you specify your language and platform, someone could probably recommend a good profiler.
Without having an idea on the kind of system you are working with, these pieces of gratuitous advice:
Try to build up knowledge on how the system scales: how are 10 times more users handled, how does it cope with 100 times more data, or with a 100 times slower network environment...
Find the proper 'probing' points in the system: a distributed system is, of course, harder to analyze than a desktop app.
Find proper technology to analyze the data received from the probes. Profilers do a great job visualizing bottleneck functions, but I can imagine they are of no help for your cloud service. Try to graphically visualize your data, your brain is much better at recognizing graphical patterns than numerical, let alone textual.
oh - find out what the expectations are! It's no use optimizing the boot time of your app if it's only booted three times a year.
I'd say the steps would be:
Identify the actual functionality that is slow, based on use of the system or interviewing users. That should narrow down the problem areas (and if nobody is complaining, maybe there's no problem.)
Run a code profiler (such as dotTrace / Compuware) and data layer profiler (e.g. SQL Profiler, NHibernate Profiler, depending on what you're using.) You should get some good results after a day or so of real use.
If you can't get a good idea of the problems from this, add some extra stopwatch code to the next live build that logs the number of milliseconds in each operation.
That should give you a pretty good picture of the multiple database queries that should be combined into one, or code that can be moved out of an inner loop or pre-calculated, etc.
Related
What I want to do is generate a call tree with CPU timing information for a Java application as it goes through a scripted task. The idea is to see how much time is spent in each part of the code, and how this changes when I change the code or the task, but to do so in a consistently repeatable way.
In Java VisualVM I can do this interactively by clicking to start and stop profiling, but I would like to automate the process so I can get more consistent results (and not get so bored). Can VisualVM do this, or is there another profiler that can?
If I were a profiler vendor I would have to be concerned about providing people what they think they want, even if what they think they want does not solve the problem they have.
The thing is, only some problems can be found by knowing how long routines typically take, and if you ignore the ones you don't find that way, they will become the dominant part of how much time your program takes.
An example of what I mean is this recent example:
A program spends 50% of its wall-clock time reading .dll files to look up string resources to get the names of files so that the strings can be displayed on a splash screen so the user can see that something is happening during application startup. That means, if there were some other way to provide eye-candy to the user, the app could start up twice as fast.
During this process, the call stack is typically 15-20 functions deep, so it's really hard to tell what's going on just by having timing numbers for the functions.
What makes the problem difficult is that it is semantic. No particular routine is "hot" in a way that it could be speeded up.
The only "hot" thing is the general description, overall, of what the program is doing, and no tool can isolate it for you.
Only you can recognize it.
However, if you simply interrupted the program and examined the call stack during startup, the probability is 50% that you would see the entire explanation for the time being spent.
If you do it several times, it's the basis of the random pausing technique that some programmers rely on because it will find every problem profilers can find, and more, and others look down on because it isn't a tool.
And do it interactively, either that or extract a small number of stack samples by using something analogous to pstack.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have a web server program written in java and my boss wants it to run faster.
Iv always been happy if it ran without error so efficiency is new to me.
I tried a profiler but it crashed my computer and turned out to be a dead opensource project.
I have no idea what I am doing except from reading a few questions on here. I see that re factoring code is the best option but Im not sure how to go about that and that i need a profiler to see what code to re factor.
So does anyone know of a free profiler that I can use ? Im using java and eclipse. if possible some instructions or a like to easy instruction would be great.
But what I really want if anyone can give it is a basic introduction to the subject so I can understand enough to go do in depth research on the subject to get the best results.
I am a complete beginner when it comes to optimising code and the subject seems very complex from what I have seen so far, any help with how to get started would be greatly appreciated.
Im new to java as well so saying things like check garbage collection would mean nothing to me, id need a more detailed explanation.
EDIT: the program uses tomcat for the networking. it connects to an SQL database. the main function is a polling loop which checks all attached devices on the network, reads events from them writes the event to the database and the performs the event functions.
I am trying to improve the polling loop. the program is heavily multithreaded and uses a lot of interfaces and proxies so it is hart to see were code goes the farther you get from the polling loop.
I hope this information helps you offer solutions. also I did not build it, I inherited the code.
First of all detect the bottlenecks. There is no point in optimizing a method from 500ms to 400ms when there is a method running for 5 seconds, when it should run for 100ms.
You can try using the VisualVM as a profiler, which is built-in in the JDK.
If you want a free profiler, use VisualVM when comes with Java. It is likely to be enough.
You should ask your boss exact what he would like to go faster. There is no point optimising random pieces of code he/she might not care about. (Its easily done)
You can also log key points in you task/request to determine what it spends the most time doing.
EDIT: the program uses tomcat for the networking. it connects to an
SQL database. the main function is a polling loop which checks all
attached devices on the network, reads events from them writes the
event to the database and the performs the event functions.
I am trying to improve the polling loop. the program is heavily
multithreaded and uses a lot of interfaces and proxies so it is hart
to see were code goes the farther you get from the polling loop
This sounds like you have a heavily I/O bound application. There really isn't much that you can do about that because I/O bound applications aren't inefficiently using the CPU--they're stuck waiting for I/O operations on other devices to complete.
FWIW, this scenario is actually why a lot of big companies are contemplating moving toward cheap, ARM-based solutions. They're wasting a lot of power and resources on powerful x86 CPUs that get underutilized while their code sits there waiting for a remote MySQL or Oracle server to finish doing its thing. With such an application, why throw more CPU than you need?
If your new to java then Optimization sounds like a bad idea. Its very easy to get wrong. Its not trivial to rewrite code and keep all the outputs the same while changing the inner workings.
Possibly have a look at your stored procedures and replace any IN statments with INNER JOIN. Thats a fairly low risk and high reward way of speeding thing up.
Start by identifying the time taken by various steps in your application (use logging to identify). Notice if there is anything unusual.
Step into each of these steps to see if there are any bottlenecks. Identify if something can be cached to save a db call. Identify if there is scope of parallelism by breaking down your tasks into independent units.
Hope you have some unit/ integration tests to ensure you don't accidentally break anything.
Measure (with a profiler - as others suggested, VisualVM is good) and locate the spots where your program spends most of its time.
Analyze the hot spots and try to improve their performance.
Measure again to verify that your changes had the expected effect.
If needed, repeat from step 1.
Start very simple.
Make a list of whats slow from a user perspective.
Try to do high level profiling yourself. Maybe an interceptor that prints the run time for your actions.
Then profile only those actions with Start time = System.currentTime...
This easy way could be a starting point into more advanced profiling and if your lucky it may fix your problems.
Before you start optimizing, you have to understand your workload, and you have to be able to recreate that workload. One easy way to do that is to log all requests, in production, with enough detail that you can recreate the requests in a development environment.
At the same time that you log your load, you can also log the performance of those requests: the time from the start of the request to the end. One way to do that (and, incidentally, to capture the data needed to log the request) is to add a servlet filter into your stack.
Then you can start to think about optimization.
Establish performance goals. Simply saying "make it faster" is pointless. Instead, you need to establish goals such as "all pages should respond within 1.5 seconds, as long as there are less than 100 concurrent users."
Identify the requests that fail your performance goals. Focus on the biggest failure first.
Identify why the request takes so long.
To do #3, you need to be able to recreate load in a development environment. Then you can either use a profiler, or simply add trace-level logging into your application to find out how long each step of the process takes.
There is also a whole field of holistic optimization, of which garbage collection tuning is probably the most important. But again, you need to establish and replicate your workload, otherwise you'll be flailing.
When starting to optimize an application, the main risk is to try to optimize every step, which does often not improve the program efficiency as expected and results in unmaintainable code.
It is likely that 80% of the execution time of your program is caused by a single step, which is itself only 20% of the code base.
The first thing to do is to identify this bottleneck. For example, you can log timestamps (using System.nanoTime and/or System.currentTimeMillis and you favorite logging framework) to do this.
Once the step has been identified, try to write a test class which runs this step, and run it with a profiler. I have good experience with both HPROF (http://java.sun.com/developer/technicalArticles/Programming/HPROF.html) although it might require some time to get familiar with, and Eclipse Test and Performance Tools Platform (http://www.eclipse.org/tptp/). If you have never used a profiler, I recommend you start with Eclipse TPTP.
The execution profile will help you find out in what methods your program spends time. Once you know them, look at the source code, and try to understand why it is slow. It might be because (this list is not exhaustive) :
unnecessary costly operations are performed,
a sub-optimal algorithm is used,
the algorithm generates lots of objects, thus giving a lot of work to the garbage collector (especially true for objects which have a medium to long life expectancy).
If there is no visible defect in the code, then you might consider :
making the algorithm more parallel in order to leverage all your CPUs
buying faster hardware.
Regarding JVM options, the two most important ones for performance are :
-server, in order to use the server VM (enabled by default depending on the hardware) which provides better performance at the price of a slower startup (http://stackoverflow.com/questions/198577/real-differences-between-java-server-and-java-client),
-Xms and -Xmx which define the heap size available on startup, and the maximum amount of memory that the JVM can use. If the JVM is not given enough memory, garbage collection will use a lot of your CPU resources, slowing down your program, however if the JVM already has enough memory, increasing the heap size will not improve performance, and might even cause longer GC pauses. (http://stackoverflow.com/questions/1043817/speed-tradeoff-of-javas-xms-and-xmx-options)
Other parameters usually have lower impact, you can consult them at http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html.
I'm developing a game for Android. It uses a surface view and uses the sort of standard 2D drawing APIs provided. When I first released the game, I was doing all sorts of daft things like re-drawing 9-patches on each frame and likewise with text. I have since optimised much of this by drawing to Bitmap objects and drawing them each frame, only re-drawing onto the Bitmap objects when required.
I've received complaints about battery drain before, and following my modifications I'd like to know (scientifically) if I've made any improvements. Unfortunately, I don't have any prior data to go by, so it would be most useful to compare the performance to some other game.
I've been running Traceview, and using the results of it mostly for the purposes of identifying CPU-time-consuming methods.
So -- what's the best way of determining my app's battery performance, and what's a good benchmark?
I know I can look at the %s of different apps through the settings, but this is again unscientific, as the figure I get from this also depends on what's happening in all of the other apps. I've looked through (most of) Google's documentation, and although the message is clear that you should be saving battery (and it gives the occasional tip as to how), there is little indication of how I can measure how well my app is performing. The last thing I want are more complaints of battery drain in the Android Market!
Thanks in advance.
EDIT
Thanks for all your helpful advice/suggestions. What I really want to know is how I can use the data coming from Traceview (ie: CPU time in ms spent on each frame of the game) to determine battery usage (if this is at all possible). Reading back on my original question, I can see I was a bit vague. Thanks again.
Here is my suggestion:
I watch power consumption while developing my apps (that sometimes poll the sensors at rates of <25ns) using PowerTutor. Check it out, it sounds like this maybe what you are looking for, the app tells you what you are using in mW, J, or relative to the rest of the system. Also, results are broken down by CPU, WiFi, Display, (other radios installed). The only catch is that it is written for a specific phone model, but I use it with great success on my EVO 4G, Galaxy S (Sprint Epic), and Hero.
Good luck,
-Steve
There is a possibility that your game is draining battery. I believe this depends on several reasons, which reads as follows:
Your application is a game. Games drains battery quickly.
You're iterating with help from a Thread. Have you limited the FPS to make the CPU skip unnecessary iterations? Since you're working with 2D I assume you're using the SurfaceView. 60 FPS will be enough for a real-time game.
You don't stop the Thread when your application terminates. Hence you reiterate code when your application isn't alive.
Have you an iterate lock that does wait(); during onPause?
The people commenting that your game is leaking battery probably aims when your application isn't in use. Otherwise, it would be wierd because every game on Android Market drains battery - more or less.
If you're trying to gauge the "improvement over your previous version", I don't think it makes sense to compare to another game! Unless those two games do the exact thing, this is as unscientific as it gets.
Instead, I would grab the previous version of your app from source control, run it, measure it, and then run it with the latest code and compare it again.
To compare, you could for example use the command line tool "top" (definitely available in busybox if your phone is rooted, not sure if it comes with a stock phone. Probably not). That shows you the CPU usage of your process in percent.
There is a battery profiler developed by Qualcomm called Trepn Profiler: https://developer.qualcomm.com/mobile-development/increase-app-performance/trepn-profiler
how I can use the data coming from Traceview (ie: CPU time in ms spent on each frame of the game) to determine battery usage (if this is at all possible)
In theory it would be possible to extrapolate the battery usage for your app by looking at the power consumption on a frame by frame basis. The best way to accomplish this would be to evaluate the power consumption of the CPU (only) for a given period (say two seconds) while your app is running the most CPU intensive operation, (additionally, GPU power usage could be gleaned this way also) while recording TraceView data (such as frames per second or flops per second) giving you the the traffic across the CPU/GPU for a given millisecond. Using this data you could accurately calculate the average peak power consumption for your app by running the above test a few times.
Here is why I say it is theory only: There are many variables to consider:
The number and nature of other processes running at the time of the above test (processor intensive)
Method of evaluating the power draw across the CPU/GPU (while tools such as PowerTutor are effective for evaluating power consumption, in this case the evaluation would not be as effective because of the need to collect time stamped power usage data. Additionally, just about any method of collecting power data would introduce an additional overhead (Schrödinger's cat) but that strictly depends on the level of accuracy you require/desire.)
The reason for the power consumption information - If you are looking to define the power consumption of your app for testing or BETA testing/evaluation purposes then it is a feasible task with some determination and the proper tools. If you are looking to gain usable information about power consumption "in the wild", on user's devices, then I would say it is plausible but not realistic. The vairables involved would make even the most determined and dedicated researcher faint. You would have to test on every possible combination of device/Android version in the wild. Additionally, the combinations of running processes/threads and installed apps is likely incalculable.
I hope this provides some insight to your question, although I may have gone deeper into it than needed.
-Steve
For anyone looking, one resource we've been using that is extremely helpful is a free app from AT&T called ARO.
Give it a look: ARO
It has helped me before and I don't see it mentioned very often so thought I'd drop it here for anyone looking.
"I know I can look at the %s of
different apps through the settings,
but this is again unscientific, as the
figure I get from this also depends on
what's happening in all of the other
apps."
The first thing I'd do is hunt for an app already out there that has a known, consistent battery usage, and then you can just use that as a reference to determine your app's usage.
If there is no such app, you will have to hope for an answer from someone else... and if you are successful making such an app, I would suggest selling your new "battery usage reference" app so that other programmers could use it. :)
I know this question is old and it's late, but for anyone who comes here looking for a solution I suggest you take a look at JouleUnit test:
http://dnlkntt.wordpress.com/2013/09/28/how-to-test-energy-consumption-on-android-devices/
It integrates into eclipse and gives you a great amount of detail about how much battery your app is consuming.
I know of three options that can help you for a having scientific measure:
Use a hardware specifically built for this. Monsoon HIGH VOLTAGE POWER MONITOR.
https://msoon.github.io/powermonitor/PowerTool/doc/Power%20Monitor%20Manual.pdf
Download and install Trepn Profiler (a tools from Qualcomm) on your phone. You wont need a computer for reporting. Reports are live and realtime on the phone. You can download Trepn Profiler from the following link: https://play.google.com/store/apps/details?id=com.quicinc.trepn&hl=en_US
Please take note that for recent phone (with android 6+) it works in estimation mode. If you need accurate numbers, you need to a list of select devices. Check the following link for the list:
https://developer.qualcomm.com/software/trepn-power-profiler/faq
You can profile apps separately, and the whole system.
Use Batterystats and Battery Historian from google.
https://developer.android.com/studio/profile/battery-historian
We have an Java ERP type of application. Communication between server an client is via RMI. In peak hours there can be up to 250 users logged in and about 20 of them are working at the same time. This means that about 20 threads are live at any given time in peak hours.
The server can run for hours without any problems, but all of a sudden response times get higher and higher. Response times can be in minutes.
We are running on Windows 2008 R2 with Sun's JDK 1.6.0_16. We have been using perfmon and Process Explorer to see what is going on. The only thing that we find odd is that when server starts to work slow, the number of handles java.exe process has opened is around 3500. I'm not saying that this is the acual problem.
I'm just curious if there are some guidelines I should follow to be able to pinpoint the problem. What tools should I use? ....
Can you access to the log configuration of this application.
If you can, you should change the log level to "DEBUG". Tracing the DEBUG logs of a request could give you a usefull information about the contention point.
If you can't, profiler tools are can help you :
VisualVM (Free, and good product)
Eclipse TPTP (Free, but more complicated than VisualVM)
JProbe (not Free but very powerful. It is my favorite Java profiler, but it is expensive)
If the application has been developped with JMX control points, you can plug a JMX viewer to get informations...
If you want to stress the application to trigger the problem (if you want to verify whether it is a charge problem), you can use stress tools like JMeter
Sounds like the garbage collection cannot keep up and starts "halt-the-world" collecting for some reason.
Attach with jvisualvm in the JDK when starting and have a look at the collected data when the performance drops.
The problem you'r describing is quite typical but general as well. Causes can range from memory leaks, resource contention etcetera to bad GC policies and heap/PermGen-space allocation. To point out exact problems with your application, you need to profile it (I am aware of tools like Yourkit and JProfiler). If you profile your application wisely, only some application cycles would reveal the problems otherwise profiling isn't very easy itself.
In a similar situation, I have coded a simple profiling code myself. Basically I used a ThreadLocal that has a "StopWatch" (based on a LinkedHashMap) in it, and I then insert code like this into various points of the application: watch.time("OperationX");
then after the thread finishes a task, I'd call watch.logTime(); and the class would write a log that looks like this: [DEBUG] StopWatch time:Stuff=0, AnotherEvent=102, OperationX=150
After this I wrote a simple parser that generates CSV out from this log (per code path). The best thing you can do is to create a histogram (can be easily done using excel). Averages, medium and even mode can fool you.. I highly recommend to create a histogram.
Together with this histogram, you can create line graphs using average/medium/mode (which ever represents data best, you can determine this from the histogram).
This way, you can be 100% sure exactly what operation is taking time. If you can't determine the culprit, binary search is your friend (fine grain the events).
Might sound really primitive, but works. Also, if you make a library out of it, you can use it in any project. It's also cool because you can easily turn it on in production as well..
Aside from the GC that others have mentioned, Try taking thread dumps every 5-10 seconds for about 30 seconds during your slow down. There could be a case where DB calls, Web Service, or some other dependency becomes slow. If you take a look at the tread dumps you will be able to see threads which don't appear to move, and you could narrow your culprit that way.
From the GC stand point, do you monitor your CPU usage during these times? If the GC is running frequently you will see a jump in your overall CPU usage.
If only this was a Solaris box, prstat would be your friend.
For acute issues like this a quick jstack <pid> should quickly point out the problem area. Probably no need to get all fancy on it.
If I had to guess, I'd say Hotspot jumped in and tightly optimised some badly written code. Netbeans grinds to a halt where it uses a WeakHashMap with newly created objects to cache file data. When optimised, the entries can be removed from the map straight after being added. Obviously, if the cache is being relied upon, much file activity follows. You probably wont see the drive light up, because it'll all be cached by the OS.
I'm trying to profile my java app, just to find out the methods in which most time is being spent. Given the poor reactions here to TPTP, I thought I'd give Java VisualVM a go.
It all seemed rather simple to use - except that I can't seem to get anything consistent or useful out of it.
I can't seem to see anything relating to MY OWN code - all I get is a whole bunch of calls to things like java.* methods.
I've tried restricting instrumentation to only my own packages, which seems to cut down the number of methods instrumented, but still I don't ever seem to see my own.
Each time I run, I get varying numbers of methods instrumented, ranging from 10's to 1000's.
I've tried putting in a sleep at the start of my app, to make sure I get VisualVM up and running before my app starts to do anything interesting, to make sure it's profiling when the interesting stuff is running.
Is there something I have to do to ensure my classes get instrumented ?
Are there timing issues ? ..like, have to wait for classes to be loaded etc ?
I've also tried running the guts of the code twice, to make sure all the code does get exercised...
I'm just running an app, with a main, from Eclipse. I've tried using the Eclipse integration so that VisualVM starts up when I start the app - results are the same.
I've also tried exporting the app as a runnable app, and running it standalone from the command line, rather than through Eclipse - same result.
My app is not a long running web app etc - just a main that calls some other of my own classes to do some processing, then quits.
I'd be grateful for any advice about what I might be doing wrong ! :)
Thanks !
I too am struggling with VisualVM, which is a shame because its user interface is fantastic while its profiling output seems horrific. You can seem my question here.
Java VisualVM giving bizarre results for CPU profiling - Has anyone else run into this?
I can tell you a couple of odd things that I have learned about VisualVM and the way it seems to do its profiling.
VisualVM appears to be counting the total time spent inside a method (wall-clock time). I have a thread in my application which starts a number of other threads and then immediately blocks waiting for a message on a queue. VisualVM will not register this method in the profiler until one of the other threads sends the message the first thread was waiting for (when the application terminates). Suddenly the blocking method call dominates the profiling output and is recorded as taking up more than 80% of the application time.
Other profilers, such as JProfiler and the one used by Azul do not count a blocked thread as taking up time for the profiler. This means that blocking methods which probably aren't interesting (situation dependant) for performance profiling are obscuring your view of that code that is eating your CPU time.
When I am running my profiling I end up with
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run()
obscuring my profiling right up until that message comes back to the waiting thread and then the top spot is shared between these two totally irrelevant methods, as well as various other uninteresting methods which don't appear on other profilers.
Secondly and I think quite importantly the method filtering mechanism doesn't work as I would have expected. This means that I can't filter out the I am trying to track down what the story is with this right now.
Not a really helpful answer. The solution as I see it right now is to pay for JProfiler - VisualVM just doesn't seem trustworthy for this task.
you could take a look at Appdynamics lite , it's has a nice features such as business transaction discovery which can samples all call made to a specific method in your code.
The lite version has a lot of limitation such as 10min sampling max and 30 business transaction discovery max.
It's would be nice to have a free tools that do the same
I assume this isn't just an academic question - you would like to see if you could make the app run faster. I assume you also wouldn't mind a little "out of the box" thinking. There are many popular ideas about performance that are actually pretty fuzzy.
For example, you say you're looking for "methods in which most time is being spent". If by that you mean "self time" (program counter actually in the method) there is probably very little, unless you've got some intense loops. Methods generally spend time by calling other methods, sometimes doing I/O.
Another fuzzy idea is that measuring method time or counting the number of calls can tell you very much about where bottlenecks are. Bottlenecks are specific lines of code, not methods, so even if you know approximately where to look, you're still playing detective.
So those are a few of the fuzzy ideas. Here is a bunch more. Let me suggest how one should think about it, and how that leads to results.
When you eventually fix something, it will reduce execution time by some percent, like (pick a number) 30%, right? (Otherwise you didn't fix anything.) OK, during that 30% it was doing something, something that it didn't need to do because later you got rid of it. So, you don't need to measure. You do need to find out what it is doing in that time, so you know what to get rid of.
A very simple way is to "pause" it 10 (or some number of) times at random. Understand what it is doing and why, by looking at the call stack and possibly some of the data. On about 3 of those times you will see it doing something you could get rid of.
You will know approximately how much it will save by seeing what percent of samples is showing it. Approximate is good enough. You can easily see exactly how much time is saved by stopwatching it before and after.
Then, don't stop. You've made the app faster. Do it again, and make it faster yet. Sooner or later you get to a point where you can't make it any faster, but it's probably in more than one step.