How do setCache() and CacheHint work together in JavaFX? - java

With regard to JavaFX, I have the following questions:
Do I need to use setCache(true) on a Node for a cache hint set by setCacheHint() to actually have any effect?
Should calling setCache actually improve performance i.e. frame rate some or most of the time? I am unable to observe any change in frame rate when I use setCache(true) and I apply scaling and other transforms.

Do I need to use setCache(true) on a Node for a cache hint set by setCacheHint() to actually have any effect?
Yes.
The cache property is a hint to the system whether the node rendering should be cached (as an internal image) at all or not.
The cacheHint property is a hint to the system of what transforms are expected on the node so that the caching operation can be optimized for those transform types, (e.g. the cache hints rotate, scale or speed, etc).
If the node is not set to be cached at all, the cacheHint is irrelevant.
Should calling setCache actually improve performance i.e. frame rate some or most of the time?
Not necessarily. JavaFX has a default frame rate cap of 60fps, so if performance is good enough that the frame rate is reached even without any cache hints, you won't see any visible difference. This is the case for many basic animations and transforms.
Even if frame rate is not improved, the cache hint may make each transform a bit more efficient to perform so that it is less CPU or GPU intensive (usually by trading visual quality).
There may be other things which have a far greater impact on your frame rate. These things may have nothing to do with rendering speed of cacheable items (for example a long running operation executed on the JavaFX application thread during a game loop or constantly changing node content).
I have used a combination of setCache(true) and setCacheHint(CacheHint.SPEED) in small games I have written that featured multiple simultaneously animated nodes with multiple effects and translucency applied to the nodes. The settings did speed things up a lot (Mac OS X, Macbook Air 2012, Java FX 2.2).
Rather than relying on hints to the rendering system, you can also manually take a snapshot of a node tree and manually replace the node tree with the snapshot Image. This snapshot technique is not always the best way to go, but it does give you an alternative if the hints aren't working out well in your case.

Related

Android GPU profiling - OpenGL Live Wallpaper is slow

I'm developing a Live Wallpaper using OpenGL ES 3.0. I've set up according to the excellent tutorial at http://www.learnopengles.com/how-to-use-opengl-es-2-in-an-android-live-wallpaper/, adapting GLSurfaceView and using it inside the Live Wallpaper.
I have a decent knowledge of OpenGL/GLSL best practices, and I've set up a simple rendering pipeline where the draw loop is as tight as possible. No re-allocations, using one static VBO for non-changing data, a dynamic VBO for updates, using only one draw call, no branching in the shaders et cetera. I usually get very good performance, but at seemingly random but reoccurring times, the framerate drops.
Profiling with the on-screen bars gives me intervals where the yellow bar ("waiting for commands to complete") shoots away and takes everything above the critical 60fps threshold.
I've read any resources on profiling and interpreting those numbers I can get my hands on, including the nice in-depth SO question here. However, the main takeaway from that question seems to be that the yellow bar indicates time spent on waiting for blocking operations to complete, and for frame dependencies. I don't believe I have any of those, I just draw everything at every frame. No reading.
My question is broad - but I'd like to know what things can cause this type of framerate drop, and how to move forward in pinning down the issue.
Here are some details that may or may not have impact:
I'm rendering on demand, onOffsetsChanged is the trigger (render when dirty).
There is one single texture (created and bound only once), 1024x1024 RGBA. Replacing the one texture2D call with a plain vec4 seems to help remove some of the framerate drops. Reducing the texture size to 512x512 does nothing for performance.
The shaders are not complex, and as stated before, contain no branching.
There is not much data in the scene. There are only ~300 vertices and the one texture.
A systrace shows no suspicious methods - the GL related methods such as buffer population and state calls are not on top of the list.
Update:
As an experiment, I tried to render only every other frame, not requesting a render every onOffsetsChanged (swipe left/right). This was horrible for the look and feel, but got rid of the yellow lag spikes almost completely. This seems to tell me that doing 60 requests per frame is too much, but I can't figure out why.
My question is broad - but I'd like to know what things can cause this
type of framerate drop, and how to move forward in pinning down the
issue.
(1) Accumulation of render state. Make sure you "glClear" the color/depth/stencil buffers before you start each render pass (although if you are rendering directly to the window surface this is unlikely to be the problem, as state is guaranteed to be cleared every frame unless you set EGL_BUFFER_PRESERVE).
(2) Buffer/texture ghosting. Rendering is deeply pipelined, but OpenGL ES tries to present a synchronous programming abstraction. If you try to write to a buffer (SubBuffer update, SubTexture update, MapBuffer, etc) which is still "pending" use in a GPU operation still queued in the pipeline then you either have to block and wait, or you force a copy of that resource to be created. This copy process can be "really expensive" for large resources.
(3) Device DVFS (dynamic frequency and voltage scaling) can be quite sensitive on some devices, especially for content which happens to sit just around a level decision point between two frequencies. If the GPU or CPU frequency drops then you may well get a spike in the amount of time a frame takes to process. For debug purposes some devices provide a means to fix frequency via sysfs - although there is no standard mechnanism.
(4) Thermal limitations - most modern mobile devices can produce more heat than they can dissipate if everything is running at high frequency, so the maximum performance point cannot be sustained. If your content is particularly heavy then you may find that thermal management kicks in after a "while" (1-10 minutes depending on device, in my experience) and forcefully drops the frequency until thermal levels drop within safe margins. This shows up as somewhat random increases in frame processing time, and is normally unpredictable once a device hits the "warm" state.
If it is possible to share an API sequence which reproduces the issue it would be easier to provide more targeted advice - the question is really rather general and OpenGL ES is a very wide API ;)

Frame Rate and draw flow in openGL

I cannot seem to understand how the frame drawing sync with buffer swapping.
Following are the questions:
1.Since most of the open GL calls are non blocking (or buffered) how do you know if the gpu is done with current frame?
2.Does open GL handles it so that an unfinished frame wont get swapped to the window buffer
3.How do you calculate the frame rate? I mean what is the basis for determining the no of frames drawn or time taken by each frame?
The modern way of syncing with the GL is using sync objects. Using glFinish() (or other blocking GL calls) has the disadvantage of stalling both the GPU and the CPU (thread): the CPU will wait until the GPU is finished, and the GPU then stalls because there is no new work queued up. If sync objects are used properly, both can be completely avoided.
You just insert a fence sync into the GL command stream at any point you are interested an, and later can check if all commands before it are completed, or you can wait for the completion (while you still can have further commands queued up).
Note that for frame rate estimation, you don't need any explicit means of synchronization. Just using SwapBuffers() is sufficient. The gpu might queue up a few frames in advance (the nvidia driver has even a setting for this), but this won't disturb fps counting, since only the first n frames are queued up. Just count the number of SwapBuffer() calls issued each second, and you will do fine. If the user has enabled sync to vblank, the frame rate will be limited to the refresh rate of the monitor, and no tearing will appear.
If you need more detailed GPU timing statistics (but for a frame rate counter, you don't), you should have a look at timer queries.
Question 1 and 2: Invoke glFinish() instead of glFlush():
Description
glFinish does not return until the effects of all previously called GL commands are complete. Such effects include all changes to GL state, all changes to connection state, and all changes to the frame buffer contents.
Question 3: Start a Timer and count how many calls to glFinish() were executed within one second.
Generally, you don't. And I can't think of a very good reason why you should ever worry about it in a real application. If you really have to know, using sync objects (as already suggested in the answer by #derhass) is your best option in modern OpenGL. But you should make sure that you have a clear understanding of why you need it, because it seems unusual to me.
Yes. While the processing of calls in OpenGL is mostly asynchronous, the sequence of calls is still maintained. So if you make a SwapBuffers call, it will guarantee that all the calls you made before the SwapBuffers call will have completed before the buffers are swapped.
There's no good easy way to measure the time used for a single frame. The most practical approach is that you render for a sufficiently long time (at least a few seconds seems reasonable). You count the number of frames you rendered during this time, and the elapsed wall clock time. Then divide number of frames by the time taken to get a frame rate.
Some of the above is slightly simplified, because this opens up some areas that could be very broad. For example, you can use timer queries to measure how long the GPU takes to process a given frame. But you have to be careful about the conclusions you draw from it.
As a hypothetical example, say you render at 60 fps, limited by vsync. You put a timer query on a frame, and it tells you that the GPU spent 15 ms to render the frame. Does this mean that you were right at the limit of being able to maintain 60 fps? And making your rendering/content more complex would drop it below 60 fps? Not necessarily. Unless you also tracked the GPU clock frequency, you don't know if the GPU really ran at its limit. Power management might have reduced the frequency/voltage to the level necessary to process the current workload. And if you give it more work, it might be able to handle it just fine, and still run at 60 fps.

Time manipulation/simulation and event scheduling in java

I'm currently prototyping a multimedia editing application in Java (pretty much like Sony Vegas or Adobe After Effects) geared towards a slightly different end.
Now, before reinventing the wheel, I'd like to ask if there's any library out there geared towards time simulation/manipulation.
What I mean specifically, , an ideal solution would be a library that can:
Schedule and generate events based on an elastic time factor. For example, real time would have a factor of 1.0, and slow motion would be any lower value; a higher value for time speedups.
Provide configurable granularity. In other words, a way to specify how frequently will time based events fire (30 frames per second, 60 fps, etc.)
An event execution mechanism of course. A way to define that an events starts and terminates at a certain point in time and get notified accordingly.
Is there any Java framework out there that can do this?
Thank you for your time and help!
Well, it seems that no such thing exists for Java. However, I found out that this is a specific case of a more general problem.
http://gafferongames.com/game-physics/fix-your-timestep/
Using fixed time stepping my application can have frame skip for free (i.e. when doing live preview rendering) and render with no time constraints when in off-line mode, pretty much what Vegas and other multimedia programs do.
Also, by using a delta factor between each frame, the whole simulation can be sped up or slowed down at will. So yeah, fixed time stepping pretty much nails it for me.

How to tell how much memory an Android app has left to use?

I'm writing an image manipulation tool and, for image quality reasons, want to edit the image in as large a size as possible.
I need to be careful that I don't run out of memory while my app is running. In addition, I want to take advantage of devices with lots of memory. For example, I'd like to support medium size images on less powerful devices like the G1 and let more powerful devices like the Samsung Galaxy S support large images.
Is there a reliable way to gauge how much memory my app can use?
My current idea is that, when deciding on the image size to load, the device could calculate the image size that brings memory usage for the app as close to about 70% as possible (i.e. not 100% as you don't want the app crashing on small allocations later or because of memory fragmentation)
I've found I can use the following code to return a percentage of the available memory:
double totalMemoryUsed = (Runtime.getRuntime().totalMemory() + android.os.Debug.getNativeHeapAllocatedSize());
int percentUsed = (int)(totalMemoryUsed / Runtime.getRuntime().maxMemory() * 100);
New bitmaps add to the native heap allocated part. This seems reliable in that my app will predictably crash with an out of memory error when percentUsed goes above 100%. Are there any caveats to the above calculation I need to be aware of?
Are there any caveats to the above
calculation I need to be aware of?
My first thought is that you really can only rely on the accuracy of a calculation like that at the instant when you perform it. Android has all sorts of services and other applications that are running in the background. At any given instant, the amount of memory currently in use is going to change. There is no way to tell by how much until you perform a calculation again. Additionally, no two phones are going to be alike so you really can't accurately estimate it.
You would almost need a thread or async task recalculating this on a constant basis which I'm not sure is a good idea.
I would stray away from this, and consider building your app using a different strategy. But good luck to you if you continue to pursure it.

How do I make for loops run side by side?

I have been working on a childish little program: there are a bunch of little circles on the screen, of different colors and sizes. When a larger circle encounters a smaller circle it eats the smaller circle, and when a circle has eaten enough other circles it reproduces. It's kind of neat!
However, the way I have it implemented, the process of detecting nearby circles and checking them for edibility is done with a for loop that cycles through the entire living population of circles... which takes longer and longer as the population tends to spike into the 3000 before it starts to drop. The process doesn't slow my computer down, I can go off and play Dawn of War or whatever and there isn't any slow down: it's just the process of checking every circle to see if it has collided with every other circle...
So what occurred to me, is that I could try to separate the application window into four quadrants, and have the circles in the quadrants do their checks simultaneously, since they would have almost no chance of interfering with each other: or something to that effect!
My question, then, is: how does one make for loops that run side by side? In Java, say.
the problem you have here can actually be solved without threads.
What you need is a spatial data structure. a quad tree would be best, or if the field in which the spheres move is fixed (i assume it is) you could use a simple grid. Heres the idea.
Divide the display area into a square grid where each cell is at least as big as your biggest circle. for each cell keep a list (linked list is best) of all the circles whose center is in that cell. Then during the collision detection step go through each cell and check each circle in that cell against all the other circles in that cell and the surrounding cells.
technically you don't have to check all the cells around each one as some of them might have already been checked.
you can combine this technique with multithreading techniques to get even better performance.
Computers are usually single tasked, this means they can usually execute one instruction at a time per CPU or core.
However, as you have noticed, your operation system (and other programs) appear to run many tasks at the same time.
This is accomplished by splitting the work into processes, and each process can further implement concurrency by spawning threads. The operation system then switches between each process and thread very quickly to give the illusion of multitasking.
In your situation,your java program is a single process, and you would need to create 4 threads each running their own loop. It can get tricky, because threads need to synchronize their access to local variables, to prevent one thread editing a variable while another thread is trying to access it.
Because threading is a complex subject it would take far more explaining than I can do here.
However, you can read Suns excellent tutorial on Concurrency, which covers everything you need to know:
http://java.sun.com/docs/books/tutorial/essential/concurrency/
What you're looking for is not a way to have these run simultaneously (as people have noted, this depends on how many cores you have, and can only offer a 2x or maybe 4x speedup), but instead to somehow cut down on the number of collisions you have to detect.
You should look into using a quadtree. In brief, you recursively break down your 2D region into four quadrants (as needed), and then only need to detect collisions between objects in nearby components. In good cases, it can effectively reduce your collision detection time from N^2 to N * log N.
Instead of trying to do parallel-processing, you may want to look for collision detection optimization. Because in many situations, perforiming less calculations in one thread is better than distributing the calculations among multiple threads, plus it's easy to shoot yourself on the foot in this multi-threading business. Try googling "collision detection algorithm" and see where it gets you ;)
IF your computer has multiple processors or multiple cores, then you could easily run multiple threads and run smaller parts of the loops in each thread. Many PCs these days do have multiple cores -- so have it so that each thread gets 1/nth of the loop count and then create n threads.
If you really want to get into concurrent programming, you need to learn how to use threads.
Sun has a tutorial for programming Java threads here:
http://java.sun.com/docs/books/tutorial/essential/concurrency/
This sounds quite similar to an experiment of mine - check it out...
http://tinyurl.com/3fn8w8
I'm also interested in quadtrees (which is why I'm here)... hope you figured it all out.

Categories