I programming a 60 FPS game and need to find those code sections that are badly affecting the game loop's performance. I'm currently storing a nanosecond timestamp at the start of the frame and split my code into logic parts, for example "Rendering particles" or "Iterate through enemy AIs". After finishing the execution of each section I store the execution time needed to execute code since the last section finished. At the end I retrieve the total execution time ([nanoseconds now] minus [first nanosecond timestamp]) and calculating the percentage of each code section execution time. This allows me to display the percentage taken by each section, but doesn't seem to be the perfect solution. I guess GC randomly affects code section executions.
Is there a better way I can implement or even an API / Analytics Tool for doing exactly this?
One of the commenters recommended traceview, which is generally useful for instrumenting code written in Java to figure out what's slow. It's less useful for a mix of Java and native (because the instrumentation disproportionately slows the Java code), and it lacks useful context when you're trying to figure out if you're meeting or missing deadlines when generating frames with OpenGL.
The preferred way to do this sort of analysis is with systrace (doc, explanation). As you can see from the second link, you can compare your app timeline directly with the vsync event and surface composition activity.
If you're running Android 4.3 (API 18) or later, you can add your own events. Adding these to strategic places in your app makes it much easier to visualize your bottlenecks. A simple example can be found here. (As of when I'm writing this, the official docs don't yet describe the 4.3 command-line usage, so it's easiest to just follow the example.)
Android 4.3 also added the "dalvik" tag; if included, your traces will show where the GC pauses start and end, as well as full details on which threads actually got suspended.
Related
I am using netbeans IDE 7.4. I want to find the lines of code that the most of running time is spent. I heard a little about profilers that can used to thread monitoring and etc... .
But I don't know (exactly!) how to find the section{s} of code that frequently used in my program. I want to find out the mechanism and equipments provided by JVM for that- not only using the third party packages(profilers and etc...).
You can profile the CPU with visualVM and you 'll find which methods are CPU consumming. You have to make filter (regex) to focus on your classes.
Suppose line L of your code, if removed, would cut the overall wall-clock time in its thread by 50%. Then if you, at a random time, dump the stacks of all running threads, locate your thread, and disregard all the levels that are not your code, there is a 50% chance that you will see line L in the remaining lines.
So if you do this 10 times, you will see line L about 5 times, give or take.
In fact, any line of your code, if you see it on more than one stack sample, if you can remove or bypass it, will save you a healthy fraction of time, guaranteed.
What's more, this method (while crude) will find any speedup that profilers can find, and more that they can't.
The math behind it can be found here.
Example: A worker thread is spending 80% of its time doing I/O and allocating memory in the process of parsing XML to build a data structure. You can see that the XML comes from a data structure in a different piece of code in the same thread. It's big code - you wouldn't have known this without the samples pointing it out. You only have to take two or three samples before you see it twice. Just bypass the XML - 5x speedup.
I am trying to compare the accuracy of timing methods with C++ and Java.
With C++ I usually use CLOCKS_PER_SEC, I run the block of code I want to time for a certain amount of time and then calculate how long it took, based on how many times the block was executed.
With Java I usually use System.nanoTime().
Which one is more accurate, the one I use for C++ or the one I use for Java? Is there any other way to time in C++ so I don't have to repeat the piece of code to get a proper measurement? Basically, is there a System.nanoTime() method for C++?
I am aware that both use system calls which cause considerable latencies. How does this distort the real value of the timing? Is there any way to prevent this?
Every method has errors. Before you spend a great deal of time on this question, you have to ask yourself "how accurate do I need my answer to be"? Usually the solution is to run a loop / piece of code a number of times, and keep track of the mean / standard deviation of the measurement. This is a good way to get a handle on the repeatability of your measurement. After that, assume that latency is "comparable" between the "start time" and "stop time" calls (regardless of what function you used), and you have a framework to understand the issues.
Bottom line: clock() function typically gives microsecond accuracy.
See https://stackoverflow.com/a/20497193/1967396 for an example of how to go about this in C (in that instance, using a usec precision clock). There's the ability to use ns timing - see for example the answer to clock_gettime() still not monotonic - alternatives? which uses clock_gettime(CLOCK_MONOTONIC_RAW, &tSpec);
Note that you have to extract seconds and nanoseconds separately from that structure.
Be careful using System.nanoTime() as it is still limited by the resolution that the machine you are running on can give you.
Also there are complications timing Java as the first few times through a function will be a lot slower until they get optimized for your system.
Virtually all modern systems use pre-emptive multi threading and multiple cores, etc - so all timings will vary from run to run. (For example if control gets switched away from your thread while it in the method).
To get reliable timings you need to
Warm up the system by running around the thing you are timing a few hundred times before starting.
Run the code for a good number of times and average the results.
The reliability issues are the same for any language so apply just as well to C as to Java so C may not need the warm-up loop but you will still need to take a lot of samples and average them.
I am using VisualVM to see where my application is slow. But it does not show all methods, probably does not show all methods that delays the application.
I have a realtime application (sound processing) and have a time deficiency in few hundreds of microseconds.
Is it possible that VisualVM hides methods which are fast themselves?
UPDATE 1
I found slow method by sampler and guessing. It was toString() method which was called from debug logging which was turned off, but consuming a time.
Settings helped and now I know how to see it: it was depending on Start profiling from option.
Other than the filters mentioned by Ryan Stewart, here are a couple of additional reasons why methods may not appear in the profiler:
Sampling profiles are inherently stochastic: a sample of the current stack of all threads is taken every N ms. Some methods which actually executed but which aren't caught in any sample during your run just won't appear. This is generally not too problematic since the very fact they didn't appear in any sample, means that with very high probability these are methods which aren't taking up a large part of your runtime.
When using instrumentation based sampling in visualvm (called "CPU profiling"), you need to define the entry point for profiled methods (the "Start profiling from" option). I have found this fails for methods in the default package, and also won't pick up time in methods which are current running when the profiler is attached (for the duration of the current invocation - it will get later invocations. This is probably because the instrumented method won't be swapped in until the current invocation finishes.
Sampling is subject to a potentially serious issue with stack traced based profiling, which is that samples are only taken at safe points in the code. When a trace is requested, each thread is forced to a safe point, then the stack is taken. In some cases you may have a hot spot in your code which does no safe point polling (common for simple loops that the JIT can guarantee terminate after a fixed number of iterations), interleaved with a bit of code that does have a safepoint poll. Your stacks will always show your process in the safe-point code, never in the safe-point free code, even though the latter may be taking the majority of CPU time.
I don't have it in front of my at the moment, but before you start profiling, there's a settings pane that's hidden by default and lets you enter regexes for filtering out methods. By default, it filters out a lot of the core JDK stuff.
I had the same problem with my pet project. I added a package name and the problem is solved. I don't understand why. VisualVM 1.4.1, jdk1.8.0_181 and jdk-10.0.2, Windows 10
I'm currently prototyping a multimedia editing application in Java (pretty much like Sony Vegas or Adobe After Effects) geared towards a slightly different end.
Now, before reinventing the wheel, I'd like to ask if there's any library out there geared towards time simulation/manipulation.
What I mean specifically, , an ideal solution would be a library that can:
Schedule and generate events based on an elastic time factor. For example, real time would have a factor of 1.0, and slow motion would be any lower value; a higher value for time speedups.
Provide configurable granularity. In other words, a way to specify how frequently will time based events fire (30 frames per second, 60 fps, etc.)
An event execution mechanism of course. A way to define that an events starts and terminates at a certain point in time and get notified accordingly.
Is there any Java framework out there that can do this?
Thank you for your time and help!
Well, it seems that no such thing exists for Java. However, I found out that this is a specific case of a more general problem.
http://gafferongames.com/game-physics/fix-your-timestep/
Using fixed time stepping my application can have frame skip for free (i.e. when doing live preview rendering) and render with no time constraints when in off-line mode, pretty much what Vegas and other multimedia programs do.
Also, by using a delta factor between each frame, the whole simulation can be sped up or slowed down at will. So yeah, fixed time stepping pretty much nails it for me.
I am a dummy in profiling, please tell me what you people do to profile your application. Which one is the better, Profiling the whole application or make an isolation? If the choice is make an isolation how you do that?
As far as possible, profile the entire application, running a real (typical) workload. Anything else and you risk getting results that lead you to focus your optimization efforts in the wrong place.
EDIT
Isn't that too hard to get a correct result when profiling the whole application? so the test result is depends on the user interaction (button clicking etc) and not using automatic task? Tell me if I'm wrong.
Getting the "correct result" depends on how you interpret the profiling data. For instance, if you are profiling an interactive application, you should figure out which parts of the profile correspond to waiting for user interaction, and ignore them.
There are a number of problems with profiling your application in parts. For example:
By deciding beforehand which parts of the application to profile, you don't get a good picture of the relative contribution of the different parts, and you risk wasting effort on the wrong parts.
You pretty much have to use artificial workloads. Whenever you do that there is a risk that the workloads are not representative of "normal" workloads, and your profiling results are biased.
In many applications, the bottlenecks are due to the way that the parts of the application interact with each other, or with I/O or garbage collection. Profiling different parts of the application separately is likely to miss these interactions.
... what i am looking for is the technique
Roughly speaking, you start with the biggest "hotspots" identified by the profile data and drill down until you've figured out why the so much is being spent in a certain area. It really helps if your profiling tool can aggregate and present the data top down and bottom up.
But, at the end of the day going from the profiling evidence (hotspots, stack snapshots, etc) to the root cause and the remedy is often down to the practical knowledge and intuition that comes from experience.
(Yea ... I'm waffling a bit. But my point is that there is no magic formula for doing this. Ultimately, you've got to use your brain ... like you have to when debugging a complex application.)
First I just time it with a watch to get an overall measurement.
Then I run it under a debugger and take stackshots. What these do is tell me which lines of code are responsible for large fractions of time. In particular, this means lines where functions are called without really needing to be, and I/O that I may not have been aware of.
Since it shows me lines of code that take time and can be done a better way, I fix those.
Then I start over at the top and see how much time I actually saved. I repeat these steps until I can no longer find things that a) take significant % of time, and b) I can fix.
This has been called "poor man's profiling". The little secret is not only is it cheap, but it is very effective, because it avoids the common myths about profiling.
P.S. If it is an interactive application, do all this just to the part of it that is slow, like if you press a "Do Useful Stuff" button, and it finishes a few seconds later. There's no point to taking stackshots when it's waiting for YOU.
P.P.S. Suppose there is some activity that should be faster, but finishes too quickly to take stackshots, like if it takes a second but should take a fraction of a second. Then what you can do is (temporarily) wrap a for loop around it, of 10 or 100 iterations. That will make it take long enough to get samples. After you've speeded it up, remove the loop.
Take a look http://www.ej-technologies.com/products/jprofiler/overview.html