I have been developing an Android app using OpenGL 1.0 for quite some time using a naive approach to rendering, basically making a call to glColor4f(...) and glDrawArrays(...) with FloatBuffers each frame. I am hitting a point where graphics is becoming a huge bottleneck as I add more UI elements and the number of draw calls increases.
So I'm now looking for the best way to group all of these calls into one (or two or three) draw calls. It looks like the cleanest, most efficient and canonical way to do this is to use VBO objects, available from OpenGL ES 2.0 on. However, this would require a HUGE refactoring on my part to switch my whole graphics backend from ES 1.0 to ES 2.0. I am not sure if this is a good decision, or if there are acceptable ways to group my drawing calls in 1.0 that would work fine for relatively simple 2D data (squares, rounded rectangle TRIANGLE_FANs, etc.), or if it really might be worth biting the bullet and making the switch. I might also mention that I have a HEAVY reliance on translation and scaling that is so convenient with the fixed pipeline of ES 1.0.
Looking around, I am surprised to find almost NO people in my position, talking about the tradeoffs and complexity at hand for such a switch. Any thought?
I have a HEAVY reliance on translation and scaling
Note you can't batch anything if you change model-view matrix between drawcalls. (ES2 didn't change that).
Vbo a available from opengl ES 1.1. And they are probably available for the device you are targeting. Even for ES1.0 (ARB_vertex_buffer_object)
You can create a big VBO with world space geometry (=resolve scaling and translation with cpu) and draw that. Even if you update this vbo each frame, in my experience, it's fast enough. Send thousands of small drawcalls is almost always the slowest.
Moving from a fixed pipline to a full vertex/fragment shader pipline is not easy at all. It require a good amount of knowledge in 3d. careful. Write a prototype first. (world-space or object-space lighting ? how transform normals ? ...)
Vivien
Related
I've recently began learning OpenGL, starting with immediate mode, glPush/PopMatrix, and the glTranslate/Rotate/Scale functions. I've switched over to vertex buffer objects for storing geometry, but I'm still using the push/pop matrix and transform functions. Is there a newer, more efficient method of performing these operations?
I've heard of glMultMatrix, but some sources have said this is less efficient.
If at all relevant, I am using LWJGL with Java for rendering.
edit: Does anyone know about the performance impact of calling glViewport and gluPerspective, as well as other standard initialization functions? I have been told that it is often good practice to call these init functions along with the rendering code every update.
For modern OpenGL you want to write a vertex shader and multiply each vertex by the appropriate transform matrix in there. You'll want to pass in the matrices you need (probably model, view, and projection). You can calculate those matrices on the CPU on each render pass as needed. This means you won't need gluPerspective. You probably only need to call glViewport once unless you're trying to divide up the window and draw different things in each section. But I wouldn't expect it to cause any performance issues. You can always profile to see for sure.
I am currently working on a project that renders large oil-wells and sub-surface data on to the android tablet using OpenGL ES 2.0.
The data comes in from a restful call made by the client (Tablet) to the server. I need to render two types of data. One being a set of vertices where I just join all the vertices (Well rendering) and the other is the subsurface rendering where each surface has huge triangle-data associated with them.
I was able to reduce the size of the well by approximating the next point and constructing the data that is to be sent to the client. But this cannot be done to the surface data as each and every triangle is important to get the triangles joined and form the surface.
I would appreciate if you guys suggest an approach to either reduce the data from the server or to reduce the time taken to render such a huge data effectively.
the way you can handle such complex mesh really depends on the scope of your project. Unfortunately there is no much we can say based on the provided inputs and the activity itself is not an easy task.
Usually when the mesh is very complex a typical approach to make the rendering process fast is to adopt dynamic Level Of Details (in programming terminology LOD).
The idea is to render "distant" meshes with a very low LOD (and therefore having a much lower number of vertices to be rendered) and there replace the mesh with an higher resolution every time the camera approaches the mesh's details.
This is a technique very used in computer games, for instance when a terrain needs to be rendered. When the player is in a particular sector of the MAP, the mesh of that sector is in High level of detail, the others are in low detail. As soon as the player moves, the different sectors become in "high resolution" (allow me the term).
It is not an easy way to do it but it works in many many situations.
In this gamasutra article, there are plenty of information on how this technique works:
http://www.gamasutra.com/view/feature/131596/realtime_dynamic_level_of_detail_.php?print=1
The idea, in your case, would be to take the mesh provided by the web service and handle it as the HD version of the mesh. Then (particularly if the mesh is composed by different objects), apply a triangular mesh simplification algorithm to create LD meshes of the same objects. An example on the way you could proceed is well described here:
http://herakles.zcu.cz/~skala/PUBL/PUBL_2002/2002_Mesh-Simplification-ICCS2002.pdf
I hope to have helped in some way.
Cheers
Maurizio
I have been working on a voxel game for some time now, but all that I have really accomplished was the main menu and an Item system. Now its time to make the voxel engine. I have been searching for a while now to find some tutorials or an ebook that will teach me such, but the best i could find were someones tutorials in c++, but I am making mine in Java. I have dabbled in c++ and c# in the past but it was too difficult to translate i.e. it relied on a class that java doesn't have. What I know is that there are different methods for voxel engines, they all begin with rendering a single cube, and Perlin and Simplex noise can be used to randomize terrain generation.
If anyone could point me in the correct direction, most appreciated.
I will be checking back at least once a hour incase someone feels this thread is dead.
I'm not entirely sure what you are asking, if you are asking how to make simplex noise, implement it in a voxel engine or how to start making a voxel engine.
If you are asking how to start making a voxel engine I would recommend practising with quads first (2D version) and focus on getting an understanding for the theory. Once you are happy with your understanding you should focus on the voxel class (one cube) - it is very important to learn as much as you can from it, and then add more so you can optimize rendering as much as you can, such that hidden faces are not rendered and even vertices are shared, voxel engines can be the most wasteful renderers if not optimized!
EDIT:
Optimization can be done through many methods, The first and most important is hidden face removal, this involves removing the faces of voxels that are touching which will mean you will need to check of a voxel exists on any given side of any voxel before rendering that face (e.g before rendering the left face, check if there isn't a block to the left). Next is the rendering method, do not render each face or each group individually, group them so they can be rendered faster, this can be done by using display-lists or the more technical VBOs, these ensure the data is in the GPU or the data can be given to the GPU faster, For example Minecraft groups them in chunks of huge 16x16x128 groups and uses display lists. If you really want to reduce every single vertex in memory you can also consider using strip drawing methods (in OpenGL), these will require you to define certain vertices at a certain time in rendering but allow you to reuse a vertex for multiple faces.
Next would be understanding simplex noise, I can relate to there not being much material online for noise generation algorithms, unfortunately I cannot link material that I used as that was years ago. You can implement your noise algorithm in the 2D version to prove it works in a simpler environment and then copy it to the voxel version. Typical usage would be to use the values as heights in the terrain (e.g white=255 = 255 high).
I would recommend using Unity. The engine is already made and you can add menus and titles with just a few lines of code. All of the game creation is either in C# or Javascript which shouldn't be any huge change from C++. Good luck!
Edit: For having real-time drawing, started using lwjgl which is base of jmonkeyengine and jocl in an "interoperability" between opengl and opencl, now can calculate and draw 100k particles real-time. Maybe mantle version of jmonkey engine can cure this drawcall overhead problem.
For several days, I have been learning jMonkey engine(ver:3.0) in Eclipse(java 64 bit) and trying how to optimize a scene with using GeometryBatchFactory.optimize(rootNode); command.
Without optimization(with capability of changing spheres positions):
Okay, only 1-fps is originated from both pci-express bandwidth+jvm overhead.
With optimization(without capability to change positions of spheres):
Now it is 29 fps even with increased triangle number.
Java3D had a setCapability() method which makes a scene object be able to be read/written even in an optimized form. jMonkey engine 3.0 must be capable of this subject but I couldn't find any trace of it(searched tutorials and examples, failed).
Question: How can I set read/write position/rotation/scale capabilities of optimized nodes of a scene in jMonkey 3.0? If you cannot give an answer to first question, can you tell me why triangle numbers increase when I use optimization command? Do I have to create a new method to access the graphics card and change the variables myself(jogl maybe?)?
Scene information: 16k particles(spheres of 16x16 res) + 1 point light(and its 4096 resolutioned shadow).
I'm sure we can send several thousands of float numbers in a millisecond through pci-express with ease.
Additional info: I'm using Aparapi-kernels to update particle
positions which takes 10 milliseconds(16k * 16k interactions to
calculate forces).(does not change anything in optimized mode :( )
Can aparapi access those optimized data?
For the case of batchNode.batch(); optimization, here is 1 fps again with lessened object-numbers:
Object number is now only several hundreds but fps is still at 1!
Sending just sphere positions to gpu and letting it calculate the vertex positions could be better than calculating vertexes on cpu plus sending huge data to gpu.
No-one here to help? Already tried batchNode but did not help enough.
I dont want to change 3d api because jMonkey people already reinvented the wheel and I'm happy with current situation. Just trying to squeeze a little more performance(canceling shadows gives %100 speed but quality is important too!).
This java program will become an asteroid-impact scene simulator(there will be choice of asteroid size,mass,speed,angle) with marching-cubes algorithm with LOD(will be millions of particles).
Marching-cubes algorithm would decrease the triangle numbers greatly. If you couldnt give any answer the question, any marching-cubes(or any O(n) convex hull) algorithm for java will be accepted! Data: x,y,z arrays as source and triangle-strip-array as target(iso-surface mesh points)
Thanks.
Here are some samples about the stream(with a much lower resolution):
1)Collapsing of a cube-shaped rock-group by gravitation:
2)Exclusion force starts to show itself:
3)Exclusion force + gravitation makes the group form a more smooth shape:
4)Group forms a sphere(as expected):
5)Then, a big stellar body approaches:
6)About to touch:
7)The moment of impact:
With help of Barnes-Hutt algorithm and a truncated potential, particle numbers will be 10x(maybe 100x) more.
Rather than Marching-Cubes algorithm, a ghost cloth which wraps the nbody can give a low-resolutioned hull(more easier than BH but need more computation)
Ghost cloth will be affected by nbody(gravity + exclusion) but nbody will not be affected by cloth which wraps it. Nbody wont be rendered but cloth mesh will be rendered with lower triange count.
If MC or above works, this will let the program render a wrapping-cloth for ~200x more particles.
So sorry....
You can batch all Geometries in a scene (or a subnode) that remains static.
Batching means that all Geometries with the same Material are combined into one mesh. This optimization only has an effect if you use only few (roughly up to 32) Materials total. The pay-off is that batching takes extra time when the game is initialized
The change in triangles therefore is because they have been all assembled into one mesh.... The only suggestion if this is necessary, is trying to get the mesh and altering points on it, but at that point I don't think it makes sense.
Perhaps try a different optimization method.
Good luck, haven't used JMonkey in a bit, but glad to see others do and its continued growth!
EDIT
BTW, a way to minimize the math might be to use half a sphere of cubes, an impact on the earth likely wouldn't affect the other side (unless the sphere isn't the earth but already a small sample of the earth taken as a sphere)...
Perhaps try a 2d shape as the impact surface, though I know this won't be your best choice, it might give you an idea of how the number of shapes might have an affect and how grand. If it does then an avenue might be to consider how to remove some of the particles, if it doesn't you need not worry. I am almost sure it will.
Finally:
Perhaps don't render in real time? Take a minute to draw the frames to a buffer then play, by the time your playing you will have another 40 or so frames etc... and maybe approx 30 secs worth is all you will need.
There is a pretty solid set of documentation within the JMonkeyEngine wiki which talks quite a bit about how to utilize the transformations you are referring to, which can be found here: Advanced Spatial Concepts.
In addition, there is quite a bit of information regarding the meshes and their rendering which you can view here: Polygon Meshes.
Maybe there's someone out there who has spent time on this. I'm working on a graph visualization lib in Java and I just did some performance tests.
When I'm adding about 2000 vertices connected by 1000 - 3000 edges, it gets really, really slow. There are tools out there doing way better (gephi for example).. How do they do it? Isn't Java2D hardware accelerated by default? Do I have to use some OpenGL lib?
I'm drawing the graphs inside a JComponent which gets redrawn by a timer every few milliseconds (doesn't really matter, if I give it 100 ms or 1 ms, it stays really slow).
Is my approach flawed or shouldn't I use Java2D for this?
Thank you for any help!
As Torious suggested you probably want to use a VolatileImage if you are working in Java2D to get the benefits of hardware acceleration.
However - If you want absolute best performance, you are probably better off going for an OpenGL - based solution.
LWJGL ( http://lwjgl.org/ ) is designed for games but allows you to use pretty much all the relevant OpenGL functionality so is pretty good for visualisation as well. Might be worth giving it a try!