jMonkey optimization similar to Java3D's

jMonkey optimization similar to Java3D's - java

Edit: For having real-time drawing, started using lwjgl which is base of jmonkeyengine and jocl in an "interoperability" between opengl and opencl, now can calculate and draw 100k particles real-time. Maybe mantle version of jmonkey engine can cure this drawcall overhead problem.
For several days, I have been learning jMonkey engine(ver:3.0) in Eclipse(java 64 bit) and trying how to optimize a scene with using GeometryBatchFactory.optimize(rootNode); command.
Without optimization(with capability of changing spheres positions):
Okay, only 1-fps is originated from both pci-express bandwidth+jvm overhead.
With optimization(without capability to change positions of spheres):
Now it is 29 fps even with increased triangle number.
Java3D had a setCapability() method which makes a scene object be able to be read/written even in an optimized form. jMonkey engine 3.0 must be capable of this subject but I couldn't find any trace of it(searched tutorials and examples, failed).
Question: How can I set read/write position/rotation/scale capabilities of optimized nodes of a scene in jMonkey 3.0? If you cannot give an answer to first question, can you tell me why triangle numbers increase when I use optimization command? Do I have to create a new method to access the graphics card and change the variables myself(jogl maybe?)?
Scene information: 16k particles(spheres of 16x16 res) + 1 point light(and its 4096 resolutioned shadow).
I'm sure we can send several thousands of float numbers in a millisecond through pci-express with ease.
Additional info: I'm using Aparapi-kernels to update particle
positions which takes 10 milliseconds(16k * 16k interactions to
calculate forces).(does not change anything in optimized mode :( )
Can aparapi access those optimized data?
For the case of batchNode.batch(); optimization, here is 1 fps again with lessened object-numbers:
Object number is now only several hundreds but fps is still at 1!
Sending just sphere positions to gpu and letting it calculate the vertex positions could be better than calculating vertexes on cpu plus sending huge data to gpu.
No-one here to help? Already tried batchNode but did not help enough.
I dont want to change 3d api because jMonkey people already reinvented the wheel and I'm happy with current situation. Just trying to squeeze a little more performance(canceling shadows gives %100 speed but quality is important too!).
This java program will become an asteroid-impact scene simulator(there will be choice of asteroid size,mass,speed,angle) with marching-cubes algorithm with LOD(will be millions of particles).
Marching-cubes algorithm would decrease the triangle numbers greatly. If you couldnt give any answer the question, any marching-cubes(or any O(n) convex hull) algorithm for java will be accepted! Data: x,y,z arrays as source and triangle-strip-array as target(iso-surface mesh points)
Thanks.
Here are some samples about the stream(with a much lower resolution):
1)Collapsing of a cube-shaped rock-group by gravitation:
2)Exclusion force starts to show itself:
3)Exclusion force + gravitation makes the group form a more smooth shape:
4)Group forms a sphere(as expected):
5)Then, a big stellar body approaches:
6)About to touch:
7)The moment of impact:
With help of Barnes-Hutt algorithm and a truncated potential, particle numbers will be 10x(maybe 100x) more.
Rather than Marching-Cubes algorithm, a ghost cloth which wraps the nbody can give a low-resolutioned hull(more easier than BH but need more computation)
Ghost cloth will be affected by nbody(gravity + exclusion) but nbody will not be affected by cloth which wraps it. Nbody wont be rendered but cloth mesh will be rendered with lower triange count.
If MC or above works, this will let the program render a wrapping-cloth for ~200x more particles.

So sorry....
You can batch all Geometries in a scene (or a subnode) that remains static.
Batching means that all Geometries with the same Material are combined into one mesh. This optimization only has an effect if you use only few (roughly up to 32) Materials total. The pay-off is that batching takes extra time when the game is initialized
The change in triangles therefore is because they have been all assembled into one mesh.... The only suggestion if this is necessary, is trying to get the mesh and altering points on it, but at that point I don't think it makes sense.
Perhaps try a different optimization method.
Good luck, haven't used JMonkey in a bit, but glad to see others do and its continued growth!
EDIT
BTW, a way to minimize the math might be to use half a sphere of cubes, an impact on the earth likely wouldn't affect the other side (unless the sphere isn't the earth but already a small sample of the earth taken as a sphere)...
Perhaps try a 2d shape as the impact surface, though I know this won't be your best choice, it might give you an idea of how the number of shapes might have an affect and how grand. If it does then an avenue might be to consider how to remove some of the particles, if it doesn't you need not worry. I am almost sure it will.
Finally:
Perhaps don't render in real time? Take a minute to draw the frames to a buffer then play, by the time your playing you will have another 40 or so frames etc... and maybe approx 30 secs worth is all you will need.

There is a pretty solid set of documentation within the JMonkeyEngine wiki which talks quite a bit about how to utilize the transformations you are referring to, which can be found here: Advanced Spatial Concepts.
In addition, there is quite a bit of information regarding the meshes and their rendering which you can view here: Polygon Meshes.

Related

Java2D faster Area alternative

I'm using Java2D in conjunction with apache batik to draw some fairly large svg images.
So far it is working quite nicely, but i am frustrated with the performance of areas. In particular, i have three things i want to accomplish:
merge a bunch of colliding shapes to one large area
removing a bunch of shapes from one large area
checking for colliding shapes
naively, point 1 and 2 can be accomplished with Area.add and Area.subtract.
This works, but can easily take up to twenty minutes in an average use case.
Point 3 can be accomplished by subtracting the areas from each other and checking the remaing area. Still slow, but can be sped up to be usable by using some prior spatial hashing or something similar.
Is there a better and faster way to merge/subtract Java2D areas?
If not, is there another library which can do this sort of thing faster?
unfortunately, libraries like JOGL or LWJGL do not work on a resolution independent space like svg-paths or the Java2D Paths.

You can try this: AreaX
According to the author:
The AreaX class is intended to achieve exactly the same visual results as the Area class. However several possible optimizations have been carefully implemented to reach those results faster.

Where to start with voxel engine?

I have been working on a voxel game for some time now, but all that I have really accomplished was the main menu and an Item system. Now its time to make the voxel engine. I have been searching for a while now to find some tutorials or an ebook that will teach me such, but the best i could find were someones tutorials in c++, but I am making mine in Java. I have dabbled in c++ and c# in the past but it was too difficult to translate i.e. it relied on a class that java doesn't have. What I know is that there are different methods for voxel engines, they all begin with rendering a single cube, and Perlin and Simplex noise can be used to randomize terrain generation.
If anyone could point me in the correct direction, most appreciated.
I will be checking back at least once a hour incase someone feels this thread is dead.

I'm not entirely sure what you are asking, if you are asking how to make simplex noise, implement it in a voxel engine or how to start making a voxel engine.
If you are asking how to start making a voxel engine I would recommend practising with quads first (2D version) and focus on getting an understanding for the theory. Once you are happy with your understanding you should focus on the voxel class (one cube) - it is very important to learn as much as you can from it, and then add more so you can optimize rendering as much as you can, such that hidden faces are not rendered and even vertices are shared, voxel engines can be the most wasteful renderers if not optimized!
EDIT:
Optimization can be done through many methods, The first and most important is hidden face removal, this involves removing the faces of voxels that are touching which will mean you will need to check of a voxel exists on any given side of any voxel before rendering that face (e.g before rendering the left face, check if there isn't a block to the left). Next is the rendering method, do not render each face or each group individually, group them so they can be rendered faster, this can be done by using display-lists or the more technical VBOs, these ensure the data is in the GPU or the data can be given to the GPU faster, For example Minecraft groups them in chunks of huge 16x16x128 groups and uses display lists. If you really want to reduce every single vertex in memory you can also consider using strip drawing methods (in OpenGL), these will require you to define certain vertices at a certain time in rendering but allow you to reuse a vertex for multiple faces.
Next would be understanding simplex noise, I can relate to there not being much material online for noise generation algorithms, unfortunately I cannot link material that I used as that was years ago. You can implement your noise algorithm in the 2D version to prove it works in a simpler environment and then copy it to the voxel version. Typical usage would be to use the values as heights in the terrain (e.g white=255 = 255 high).

I would recommend using Unity. The engine is already made and you can add menus and titles with just a few lines of code. All of the game creation is either in C# or Javascript which shouldn't be any huge change from C++. Good luck!

Optimising conways game of life

I'm busy coding Conways Game of Life and I'm trying to optimise it using some data structure that records which cells should be checked on each life cycle.
I'm using an arrayList as a dynamic data structure to keep a record of all living cells and their neighbours. Is there a better data structure or way of keeping a shorter list that will increase the games speed?
I ask this because often many cells are checked but not changed so I feel like my implementation could be improved.

I believe that the Hashlife algorithm could help you.
It gives the idea of using a quadtree (tree data structure in which each internal node has exactly four children) to keep the data, and then it uses hash tables to store the nodes of the quadtree.
For further reading, this post, written by Eric Burnett, gives a great insight about how Hashlife works, it's performance and implementation (although in Python). It's worth a read.

I built a Life engine that operated on 256x512 bit grids directly mapped to screen pixels back in the 1970s, using a 2Mhz 6800 8 bit computer. I did it directly on the display pixels (they were one-bit on/off white/black) because I wanted to see the results and didn't see the point in copying the Life image to the display.
Its fundamental trick was to treat the problem as one of evaluating a Boolean logic formula for "this cell is on" based on rules of Life, rather than counting live neighbors as is usual. This formula is pretty easy to figure out, so left as a homework exercise. What made it fast was that the Boolean formula was computed on a per-bit basis, doing 8 bits at a time. If you sweep down the screen and across rows, you can in essence evaluate N bits at once (8 on the 6800, 64 on modern PCs) with very low overhead. If you go nuts, you can likely use the SIMD vector extensions and do 256 bits or more at "once". Over the top would be doing this with a GPU.
The 6800 version would process a complete screen in about .5 second; you could watch the update ripple down the screen from top to bottom (60 Hz refresh). On a modern CPU with 1000x the clock rate (1 GHz is pretty easy to get) and 64 bits at a time, it should be able to produce thousands of frames per second. So fast you can't watch it run :-{
A useful observation is that much of the Life world is dead (blank) and processing that part mostly produces more dead cells. This suggests using a sparse representation. Another poster suggested quadtrees, which I think is a very good suggestion. Your quadtree regions don't have to be square, either.
Combining the two ideas, quadtrees for non-blank regions with bit-level processing for blocks of bits designated by the quadtrees is likely to give an astonishingly fast Life algorithm.

Real time object detection (sift)?

I am investigatin this field to obtain object detection in real time.
Video example:
http://www.youtube.com/watch?v=Bm5qUG-06V8
http://www.youtube.com/watch?v=aYd2kAN0Y20
But how can they extract sift keypoint and matching them so fast?
SIFT extraction requires a second generally

I'm an OpenIMAJ developer and responsible for making the first video.
We're not doing anything particularly fancy to make the matching fast in that video, and the SIFT detection and extraction is carried out on the entirety of every frame. In fact that video was made well before we did any optimisation; the current version of that demo is much smoother. We do also have a version with a hybrid KLT-tracker that works even faster by not having to perform SIFT on every frame.
As suggested by #Mario, the image size does have a big effect on the speed of the extraction, so processing a smaller frame can give a big win. Secondly, in the original description of the difference of Gaussian interest point localisation suggested by Lowe in the SIFT paper, it was suggested that the input image was first doubled in size to increase the number of features. By not performing this double-sizing you also get a big performance boost at the expense of having fewer features to match.
The code is open source (BSD license) and you can get it by following the links at http://www.openimaj.org. As stated in the video description, the image-processing code is pure Java; the only native code is a thin interface to the webcam. Tutorial number 7 in the current tutorial pdf document walks through the process of using SIFT in OpenIMAJ. Disabling the double-sizing can be achieved by doing:
DoGSIFTEngine engine = new DoGSIFTEngine();
engine.getOptions().setDoubleInitialImage(false);

SIFT can be accelerated in several ways :
if you can afford approximations, then you can derive a keypoint called SURF which is way faster (using integral images for most tasks)
you can use parallel implementations, at the CPU level (e.g. OpenCV uses Intel's TBB) or at the GPU level (google for sift gpu for related code and doc).
Anyway, none of these is available (AFAIK) in Java, so you'll have to use a Java wrapper to opencv or work it out yourself.

General and first idea: Ask the video uploader(s). We can just assume what's done or how it's done. It might also help to know what you've done so far (e.g. your video resolution, your processing power, image preparation, etc.).
I haven't used SIFT specifically, but I did quite some object/motion tracking during the last few years, so this is more in general. You might have tried some points already, I don't know.
Reduce your image resolution: Going from 640x480 to 320x240 will reduce your data to 25%. Going down to 160x120 will cut it by another 25% (so 6.25 % data left) without significantly impacting your algorithm.
In a similar way, it might be useful to reduce the color depth of your image (not just 256 grayscale, but maybe even more; like 64 colors).
Try other methods to make features more obvious or faster to find, e.g. try running an edge detector over your image.
At least the second video mentions a tracking system, so you could try to guess the region where the object tracked should reappear the next frame (using some simple a/b filter or whatever on coordinates and possibly rotation), then use SIFT on that sub area (with some added margin) only. Only analyze the whole image if you can't find it again. At around 40 or 50 seconds in the second video they're losing the object and need quite some time/tries to find it again.

Fast graphing library

I've been using Incanter for my graphing needs, which was adequate but slow for my previous needs.
Now I need to embed a graph in a JPanel. Users will need to interact with the graph (e.g. clicking on certain points which the program would need to receive and deal with) by dragging and clicking. Zooming in a out is a must as well.
I've heard about JFreeChart on other SO discussions, but I see that Incanter uses that as it's graphing engine, and it seemed somewhat slow then. It it actually fast, but perhaps Incanter is doing things that slow it down?
I'm graphing up to 2 million points (simple xy-plots, really), though generally will be graphing less. Using Matlab, this is plotted in a few seconds, but Incanter can hang for minutes.
So is JFreeChart the way to go? Or something else, given my needs?
(Also, it needs to be free, as it is for research.)

Unfortunately, general purpose graphing solutions probably aren't going to scale well to 2 million points - that's big enough that you will need something specialized if you want interactive performance.
I can see a few sensible options:
Write your own custom "plotter" function that is optimized for drawing large numbers of points. You'd have to test it, but I think you might just about get the performance you want by writing the points directly to a BufferedImage using setRGB in a tight loop. If that still isn't fast enough, you can write the points directly into a byte array and construct a MemoryImageSource.
Exclude points so that you are only drawing e.g. 10,000 points maximum. This may be perfectly acceptable as long as you only really care about the overall shape of the scatter plot rather than individual points.
Pre-render all the points into e.g. a large BufferedImage then allow users to zoom in and out / interact with this static image. You might be able to "hack" JFreeChart to do this.
If OpenGL is an option (will require native code + getting up a steep learning curve!), then drop all the points in a big vertex array and get the graphics card to do it all for you. Will handle 2 million points in real-time without any difficulty.

MathGL is fast and free (GPL) plotting library. But I never test its java interface (swig based) since I'm not familiar with java :( . So, if one can help with testing then I'll be thankful.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.