Android GPU profiling - OpenGL Live Wallpaper is slow

Android GPU profiling - OpenGL Live Wallpaper is slow - java

I'm developing a Live Wallpaper using OpenGL ES 3.0. I've set up according to the excellent tutorial at http://www.learnopengles.com/how-to-use-opengl-es-2-in-an-android-live-wallpaper/, adapting GLSurfaceView and using it inside the Live Wallpaper.
I have a decent knowledge of OpenGL/GLSL best practices, and I've set up a simple rendering pipeline where the draw loop is as tight as possible. No re-allocations, using one static VBO for non-changing data, a dynamic VBO for updates, using only one draw call, no branching in the shaders et cetera. I usually get very good performance, but at seemingly random but reoccurring times, the framerate drops.
Profiling with the on-screen bars gives me intervals where the yellow bar ("waiting for commands to complete") shoots away and takes everything above the critical 60fps threshold.
I've read any resources on profiling and interpreting those numbers I can get my hands on, including the nice in-depth SO question here. However, the main takeaway from that question seems to be that the yellow bar indicates time spent on waiting for blocking operations to complete, and for frame dependencies. I don't believe I have any of those, I just draw everything at every frame. No reading.
My question is broad - but I'd like to know what things can cause this type of framerate drop, and how to move forward in pinning down the issue.
Here are some details that may or may not have impact:
I'm rendering on demand, onOffsetsChanged is the trigger (render when dirty).
There is one single texture (created and bound only once), 1024x1024 RGBA. Replacing the one texture2D call with a plain vec4 seems to help remove some of the framerate drops. Reducing the texture size to 512x512 does nothing for performance.
The shaders are not complex, and as stated before, contain no branching.
There is not much data in the scene. There are only ~300 vertices and the one texture.
A systrace shows no suspicious methods - the GL related methods such as buffer population and state calls are not on top of the list.
Update:
As an experiment, I tried to render only every other frame, not requesting a render every onOffsetsChanged (swipe left/right). This was horrible for the look and feel, but got rid of the yellow lag spikes almost completely. This seems to tell me that doing 60 requests per frame is too much, but I can't figure out why.

My question is broad - but I'd like to know what things can cause this
type of framerate drop, and how to move forward in pinning down the
issue.
(1) Accumulation of render state. Make sure you "glClear" the color/depth/stencil buffers before you start each render pass (although if you are rendering directly to the window surface this is unlikely to be the problem, as state is guaranteed to be cleared every frame unless you set EGL_BUFFER_PRESERVE).
(2) Buffer/texture ghosting. Rendering is deeply pipelined, but OpenGL ES tries to present a synchronous programming abstraction. If you try to write to a buffer (SubBuffer update, SubTexture update, MapBuffer, etc) which is still "pending" use in a GPU operation still queued in the pipeline then you either have to block and wait, or you force a copy of that resource to be created. This copy process can be "really expensive" for large resources.
(3) Device DVFS (dynamic frequency and voltage scaling) can be quite sensitive on some devices, especially for content which happens to sit just around a level decision point between two frequencies. If the GPU or CPU frequency drops then you may well get a spike in the amount of time a frame takes to process. For debug purposes some devices provide a means to fix frequency via sysfs - although there is no standard mechnanism.
(4) Thermal limitations - most modern mobile devices can produce more heat than they can dissipate if everything is running at high frequency, so the maximum performance point cannot be sustained. If your content is particularly heavy then you may find that thermal management kicks in after a "while" (1-10 minutes depending on device, in my experience) and forcefully drops the frequency until thermal levels drop within safe margins. This shows up as somewhat random increases in frame processing time, and is normally unpredictable once a device hits the "warm" state.
If it is possible to share an API sequence which reproduces the issue it would be easier to provide more targeted advice - the question is really rather general and OpenGL ES is a very wide API ;)

Related

LWJGL - Reason for cyclic freezes?

I am currently working on a 2D Game that uses LWJGL, but I have stumbled across some serious performance issues.
When I render more than ~100 sprites, the window freezes for a very small amount of time. I did some tests and I found out the following:
The problem occurs with both Vsync enabled or disabled
The problem occurs even if I cap the frames at 60
The program is not just rendering less frames for a short time, the Rendering seems to actually pause
There are no other operations like Matrix-Calculations that slow down the program
I already have implemented batch rendering, but it does not seem to improve the performance
The frequency of the freezes increases with the amount of Sprites
My Graphics Card driver is up to date
The problem occurs although the framerate seems to be quite acceptable, with 100 rendered sprites at the same time, I have ~1500 fps, with 1000 sprites ~200 fps
I use a very basic shader, the transformation matrices are passed to the shader via uniform variables each rendering call (Once per sprite per frame). The size of the CPU/GPU bus shouldn't be an issue.
I have found a very similar issue here, but none of the suggested solutions work for me.
This is my first question here, please let me know if I am missing some important information.

It's probably GC.
Java is sadly not the best language for games thanks to GC and lack of any structures that can be allocated at stack, from languages similar to Java - c# is often better choice thanks to much more tools to control memory, like stack alloc and just structures in general.
So when writing game in languages with GC you should make sure your game loop does not allocate too many objects, in many cases in other languages people often try to go for 0 or near 0 allocations in loop.
You can create objects pools for your entities/sprites, so you don't allocate new ones, just re-use existing ones.
And if it's simple 2d game, then probably just avoiding allocating objects when there is no need to should be enough (like passing just two ints instead of object holding location on 2d map).
And you should use profiler to confirm what changes are worth it.
There are also more tricky solutions, like using off heap manually allocated memory to store some data without object overhead, but I don't think simple game will need such solutions. Just typical game-dev solutions like pooling and avoiding not needed objects should be enough.

Frame Rate and draw flow in openGL

I cannot seem to understand how the frame drawing sync with buffer swapping.
Following are the questions:
1.Since most of the open GL calls are non blocking (or buffered) how do you know if the gpu is done with current frame?
2.Does open GL handles it so that an unfinished frame wont get swapped to the window buffer
3.How do you calculate the frame rate? I mean what is the basis for determining the no of frames drawn or time taken by each frame?

The modern way of syncing with the GL is using sync objects. Using glFinish() (or other blocking GL calls) has the disadvantage of stalling both the GPU and the CPU (thread): the CPU will wait until the GPU is finished, and the GPU then stalls because there is no new work queued up. If sync objects are used properly, both can be completely avoided.
You just insert a fence sync into the GL command stream at any point you are interested an, and later can check if all commands before it are completed, or you can wait for the completion (while you still can have further commands queued up).
Note that for frame rate estimation, you don't need any explicit means of synchronization. Just using SwapBuffers() is sufficient. The gpu might queue up a few frames in advance (the nvidia driver has even a setting for this), but this won't disturb fps counting, since only the first n frames are queued up. Just count the number of SwapBuffer() calls issued each second, and you will do fine. If the user has enabled sync to vblank, the frame rate will be limited to the refresh rate of the monitor, and no tearing will appear.
If you need more detailed GPU timing statistics (but for a frame rate counter, you don't), you should have a look at timer queries.

Question 1 and 2: Invoke glFinish() instead of glFlush():
Description
glFinish does not return until the effects of all previously called GL commands are complete. Such effects include all changes to GL state, all changes to connection state, and all changes to the frame buffer contents.
Question 3: Start a Timer and count how many calls to glFinish() were executed within one second.

Generally, you don't. And I can't think of a very good reason why you should ever worry about it in a real application. If you really have to know, using sync objects (as already suggested in the answer by #derhass) is your best option in modern OpenGL. But you should make sure that you have a clear understanding of why you need it, because it seems unusual to me.
Yes. While the processing of calls in OpenGL is mostly asynchronous, the sequence of calls is still maintained. So if you make a SwapBuffers call, it will guarantee that all the calls you made before the SwapBuffers call will have completed before the buffers are swapped.
There's no good easy way to measure the time used for a single frame. The most practical approach is that you render for a sufficiently long time (at least a few seconds seems reasonable). You count the number of frames you rendered during this time, and the elapsed wall clock time. Then divide number of frames by the time taken to get a frame rate.
Some of the above is slightly simplified, because this opens up some areas that could be very broad. For example, you can use timer queries to measure how long the GPU takes to process a given frame. But you have to be careful about the conclusions you draw from it.
As a hypothetical example, say you render at 60 fps, limited by vsync. You put a timer query on a frame, and it tells you that the GPU spent 15 ms to render the frame. Does this mean that you were right at the limit of being able to maintain 60 fps? And making your rendering/content more complex would drop it below 60 fps? Not necessarily. Unless you also tracked the GPU clock frequency, you don't know if the GPU really ran at its limit. Power management might have reduced the frequency/voltage to the level necessary to process the current workload. And if you give it more work, it might be able to handle it just fine, and still run at 60 fps.

How do setCache() and CacheHint work together in JavaFX?

With regard to JavaFX, I have the following questions:
Do I need to use setCache(true) on a Node for a cache hint set by setCacheHint() to actually have any effect?
Should calling setCache actually improve performance i.e. frame rate some or most of the time? I am unable to observe any change in frame rate when I use setCache(true) and I apply scaling and other transforms.

Do I need to use setCache(true) on a Node for a cache hint set by setCacheHint() to actually have any effect?
Yes.
The cache property is a hint to the system whether the node rendering should be cached (as an internal image) at all or not.
The cacheHint property is a hint to the system of what transforms are expected on the node so that the caching operation can be optimized for those transform types, (e.g. the cache hints rotate, scale or speed, etc).
If the node is not set to be cached at all, the cacheHint is irrelevant.
Should calling setCache actually improve performance i.e. frame rate some or most of the time?
Not necessarily. JavaFX has a default frame rate cap of 60fps, so if performance is good enough that the frame rate is reached even without any cache hints, you won't see any visible difference. This is the case for many basic animations and transforms.
Even if frame rate is not improved, the cache hint may make each transform a bit more efficient to perform so that it is less CPU or GPU intensive (usually by trading visual quality).
There may be other things which have a far greater impact on your frame rate. These things may have nothing to do with rendering speed of cacheable items (for example a long running operation executed on the JavaFX application thread during a game loop or constantly changing node content).
I have used a combination of setCache(true) and setCacheHint(CacheHint.SPEED) in small games I have written that featured multiple simultaneously animated nodes with multiple effects and translucency applied to the nodes. The settings did speed things up a lot (Mac OS X, Macbook Air 2012, Java FX 2.2).
Rather than relying on hints to the rendering system, you can also manually take a snapshot of a node tree and manually replace the node tree with the snapshot Image. This snapshot technique is not always the best way to go, but it does give you an alternative if the hints aren't working out well in your case.

Libgdx game logic in Render?

I'm learning Libgdx and have some questions about updating my game logic during the render method..
I would ideally like to keep my game logic and my render separate. The reason for this is if i have high FPS on a system my game loop would "run" faster.
what i am looking for is to keep the experance constant and possibily Limit my updates..if any one can point me towards a tutorial on how to
a)Limit my render updates via DeltaTime
b)Limit my game logic updates via Deltatime.
Thank you :)

After re-reading your question, I think the trick that you are missing (based on your comment that running on a higher-refresh system would result in your game logic running faster), is that you actually scale your updates based on the "delta" time that is passed to render. Andrei Bârsan mentions this above, but I thought I'd elaborate a bit on how delta is used.
For instance, within my game's render(), I first call my entityUpdate(delta), which updates and moves all of the objects in my game scaled by the distance traveled in time "delta" (it doesn't render them, just moves their position variables). Then I call entityManageCollisions(delta), which resolves all of the collisions caused by the update, then I finally call entityDraw(batch, delta), which uses delta to get the right frames for sprite animations, and actually draws everything on the screen.
I use a variant of an Entity/Componet/System model so I handle all of my entities generically, and those method calls I mention above are essentially "Systems" that act on Entities with certain combinations of components on them.
So, all that to say, pass delta (the parameter passed into render()) into all of your logic, so you can scale things (move entities the appropriate distance) based on the amount of time that has elapsed since the last call. This requires that you set your speeds based on units / second for your entities, since you're passing in a value to scale them by that is a fraction of a second. Once you do it a few times, and experiment with the results, you'll be in good shape.
Also note: This will drive you insane in interactive debug sessions, since the delta timer keeps accumulating time since the last render call, causing your entities to fly across the whole screen (and beyond -- test those boundaries for you!) since they generally get sub-second updates, but may wind up getting passed 30 seconds (or however long you spent looking at things stepping through the debugger), so at the very top of my render(), I have a line that says delta = 0.016036086f; (that number was a sample detla from my dev workstation, and seems to give decent results -- you can capture what your video system's typical delta is by writting it to the console during a test run, and use that value instead, if you like) which I comment out for builds to be deployed, but leave un-commented when debugging, so each frame moves the game forward a consistent amount, regardless of how long I spend looking at things in the debugger.
Good luck!

The answer so far isn't using parallel threads - I've had this question myself in the past and I've been advised against it - link. A good idea would be to run the world update first, and then skip the rendering if there isn't enough time left in the frame for it. Delta times should be used nevertheless to keep everything going smooth and prevent lagging.
If using this approach, it would be wise to prevent more than X consecutive frame skips from happening, since in the (unlikely, but possible, depending on how much update logic there is compared to rendering) case that the update logic lasts more than the total time allocated for a frame, this could mean that your rendering never happens - and that isn't something that you'd want. By limiting the numbers of frames you skip, you ensure the updates can run smoothly, but you also guarantee that the game doesn't freeze when there's too much logic to handle.

Graphics spoilt by intermittent jumps

I have written a game app with bitmaps moving around the screen. It employs a separate thread which writes directly to a canvas. On my Samsung Galaxy Y the animations seems smooth throughout the game, however on a "Tabtech m7" tablet the smooth graphics appear to be interrupted by intermittent freezes of about half a second duration, and spaced about three or four seconds apart. Is it possible that it is just a feature of the (cheap) tablet hardware, or is it more likely that it's some aspect of my programming? And if it's me, how could I go about diagnosing the cause?

Have a look in your log to see if the garbage collector is running approximately when you get the freezes. If so you could perhaps try and find out if its you or the system that is allocation memory in a inappropriate way.
In DDMS you can have a look at the Allocation Tracker, could possibly tell you whats going on.

Yes, echoing erbsman. To avoid GC make sure you're not allocating any new objects in your game loop. Also, GC's can be kicked off if you do lot of string conversions (ie, updating score) Like if you do Integer.toString(10) kinda stuff.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.