I'm busy coding Conways Game of Life and I'm trying to optimise it using some data structure that records which cells should be checked on each life cycle.
I'm using an arrayList as a dynamic data structure to keep a record of all living cells and their neighbours. Is there a better data structure or way of keeping a shorter list that will increase the games speed?
I ask this because often many cells are checked but not changed so I feel like my implementation could be improved.
I believe that the Hashlife algorithm could help you.
It gives the idea of using a quadtree (tree data structure in which each internal node has exactly four children) to keep the data, and then it uses hash tables to store the nodes of the quadtree.
For further reading, this post, written by Eric Burnett, gives a great insight about how Hashlife works, it's performance and implementation (although in Python). It's worth a read.
I built a Life engine that operated on 256x512 bit grids directly mapped to screen pixels back in the 1970s, using a 2Mhz 6800 8 bit computer. I did it directly on the display pixels (they were one-bit on/off white/black) because I wanted to see the results and didn't see the point in copying the Life image to the display.
Its fundamental trick was to treat the problem as one of evaluating a Boolean logic formula for "this cell is on" based on rules of Life, rather than counting live neighbors as is usual. This formula is pretty easy to figure out, so left as a homework exercise. What made it fast was that the Boolean formula was computed on a per-bit basis, doing 8 bits at a time. If you sweep down the screen and across rows, you can in essence evaluate N bits at once (8 on the 6800, 64 on modern PCs) with very low overhead. If you go nuts, you can likely use the SIMD vector extensions and do 256 bits or more at "once". Over the top would be doing this with a GPU.
The 6800 version would process a complete screen in about .5 second; you could watch the update ripple down the screen from top to bottom (60 Hz refresh). On a modern CPU with 1000x the clock rate (1 GHz is pretty easy to get) and 64 bits at a time, it should be able to produce thousands of frames per second. So fast you can't watch it run :-{
A useful observation is that much of the Life world is dead (blank) and processing that part mostly produces more dead cells. This suggests using a sparse representation. Another poster suggested quadtrees, which I think is a very good suggestion. Your quadtree regions don't have to be square, either.
Combining the two ideas, quadtrees for non-blank regions with bit-level processing for blocks of bits designated by the quadtrees is likely to give an astonishingly fast Life algorithm.
Related
For a small project we're trying to implement an autopilot for a slot car. A gyro sensor is attached to the car and delivers the Z-value (meaning the amount of centrifugal force acting on the car/sensor) 20 times per second. One crucial part of this is the detection of whether or not the car is in a curve or on a straight part and when exactly it was entered and when it left that part. Only so we can have reliable prediction of what'll happen in the future.
As for now, we're working with a sliding window to smooth the data and then have hardcoded limits (-400 for a left curve and +400 for a right curve) to detect what kind of sector (left, right, straight) we're in.
Obviously this takes too long, as it takes a couple of messages until the program detects that it's a direction change because of the smoothing and the hardcoded limits.
Here's an example of two rounds on a simple track, starting at the checkered area:
A perfect algorithm would detect the sectors S R S R S L S R S R S R S for one round, with a delay of only a couple of data points.
We thought about using the first derivative of the gyro values, but in the sample graph right after the first left curve, the following right curve (between 22:36:40 and 22:36:42) shows signs of swerving. Here the first derivative would be close to 0 and indicate a straight part...
Also, there we'd need to set a hardcoded threshold again, but with the noise of the data it could be that a small bump in the track could result in such a noise level that it's derivative would exceed the threshold.
Now we're not sure about what would be the easiest/fastest/most reliable way to handle this sort of detection. Would using a derivative be a good idea? Is there a better way?
Any input would be greatly appreciated :)
The existing software is written in Java.
In such problems, you have to trade robustness for immediacy. If you don't know what happens in the future, you can only make assumptions. And these assumptions may hold or may not.
From the looks of your data, there shouldn't be any smoothing necessary. If you define a reasonable threshold, the curves should be recognized quite reliably. If, however, this is not the case, here are some things you could try:
You already mentioned smoothing. The crucial point is how you smooth. An asymmetric smoothing kernel is probably desirable (a half triangle filter can be updated in constant time). You can directly weigh robustness and immediacy by modifying the kernel width.
A simple alternative to filtering is counting. If your data is above the curve threshold, don't call it a curve just yet. Count how many data points are above the threshold in a row. If there are more than n data points above the threshold, then you're most likely in a curve.
Using derivatives is potentially problematic. The main reason against derivatives is that a curve is not defined by any derivative at all (at least no derivative of the force). The second problem is that you can only estimate the derivatives numerically, which is quite unstable with lots of noise. So you would have to smooth your data (or find a numerical scheme for your noise model), which again requires some latency.
I am trying to make a music visualizer in Processing, not that that part is super important, and I'm using a fast fourier transform through Minim. It's working perfectly (reading the data), but there is a large spike on the left (bass) end. What's the best way to 'level' this out?
My source code is here, if you want to take a look.
Thanks in advance,
-tlf
The spectrum you show looks fairly typical of a complex musical sound where you have a complex section at lower frequencies, but also some clear harmonics emerging from the low frequency mess. And, actually, these harmonics are atypically clear... music in general is complicated. Sometimes, for example, if a flute is playing a single clear note one will get a single nice peak or two, but it's much more common that transients and percussive sounds lead to a very complicated spectrum, especially at low frequencies.
For comparing directly to the video, it seems to me that the video is a bit odd. My guess is that the spectrum they show is either a zoom in a small section of the spectrum far from zero, or that it's just a graphical algorithm that's based off the music but doesn't correspond to an actual spectrum. That is, if you really want something to look very similar to this video, you'll need more than the spectrum, though the spectrum will likely be a good starting point. Here are a few points to note:
1) there is a prominent peak which occasionally appears right above the "N" in the word anchor. A single dominant peak should be clear in the audio as an approximately pure tone.
2) occasionally there's another peak that varies temporally with this peak, which would normally be a sign that the second peak is a harmonic, but many times this second peak isn't there.
3) A good examples of odd behavior, is a 2:26. This time just follows a little laser sound effect, and then there's basically a quite hiss. A hiss should be a broad spectrum sound without peaks, often weighted to lower frequencies. At 2:26, though, there's just this single large peak above the "N" with nothing else.
It turns out what I had to do was multiply the data by
Math.log(i + 2) / 3
where i is the index of the data being referenced, zero-indexed from the left (bass).
You can see this in context here
Edit: For having real-time drawing, started using lwjgl which is base of jmonkeyengine and jocl in an "interoperability" between opengl and opencl, now can calculate and draw 100k particles real-time. Maybe mantle version of jmonkey engine can cure this drawcall overhead problem.
For several days, I have been learning jMonkey engine(ver:3.0) in Eclipse(java 64 bit) and trying how to optimize a scene with using GeometryBatchFactory.optimize(rootNode); command.
Without optimization(with capability of changing spheres positions):
Okay, only 1-fps is originated from both pci-express bandwidth+jvm overhead.
With optimization(without capability to change positions of spheres):
Now it is 29 fps even with increased triangle number.
Java3D had a setCapability() method which makes a scene object be able to be read/written even in an optimized form. jMonkey engine 3.0 must be capable of this subject but I couldn't find any trace of it(searched tutorials and examples, failed).
Question: How can I set read/write position/rotation/scale capabilities of optimized nodes of a scene in jMonkey 3.0? If you cannot give an answer to first question, can you tell me why triangle numbers increase when I use optimization command? Do I have to create a new method to access the graphics card and change the variables myself(jogl maybe?)?
Scene information: 16k particles(spheres of 16x16 res) + 1 point light(and its 4096 resolutioned shadow).
I'm sure we can send several thousands of float numbers in a millisecond through pci-express with ease.
Additional info: I'm using Aparapi-kernels to update particle
positions which takes 10 milliseconds(16k * 16k interactions to
calculate forces).(does not change anything in optimized mode :( )
Can aparapi access those optimized data?
For the case of batchNode.batch(); optimization, here is 1 fps again with lessened object-numbers:
Object number is now only several hundreds but fps is still at 1!
Sending just sphere positions to gpu and letting it calculate the vertex positions could be better than calculating vertexes on cpu plus sending huge data to gpu.
No-one here to help? Already tried batchNode but did not help enough.
I dont want to change 3d api because jMonkey people already reinvented the wheel and I'm happy with current situation. Just trying to squeeze a little more performance(canceling shadows gives %100 speed but quality is important too!).
This java program will become an asteroid-impact scene simulator(there will be choice of asteroid size,mass,speed,angle) with marching-cubes algorithm with LOD(will be millions of particles).
Marching-cubes algorithm would decrease the triangle numbers greatly. If you couldnt give any answer the question, any marching-cubes(or any O(n) convex hull) algorithm for java will be accepted! Data: x,y,z arrays as source and triangle-strip-array as target(iso-surface mesh points)
Thanks.
Here are some samples about the stream(with a much lower resolution):
1)Collapsing of a cube-shaped rock-group by gravitation:
2)Exclusion force starts to show itself:
3)Exclusion force + gravitation makes the group form a more smooth shape:
4)Group forms a sphere(as expected):
5)Then, a big stellar body approaches:
6)About to touch:
7)The moment of impact:
With help of Barnes-Hutt algorithm and a truncated potential, particle numbers will be 10x(maybe 100x) more.
Rather than Marching-Cubes algorithm, a ghost cloth which wraps the nbody can give a low-resolutioned hull(more easier than BH but need more computation)
Ghost cloth will be affected by nbody(gravity + exclusion) but nbody will not be affected by cloth which wraps it. Nbody wont be rendered but cloth mesh will be rendered with lower triange count.
If MC or above works, this will let the program render a wrapping-cloth for ~200x more particles.
So sorry....
You can batch all Geometries in a scene (or a subnode) that remains static.
Batching means that all Geometries with the same Material are combined into one mesh. This optimization only has an effect if you use only few (roughly up to 32) Materials total. The pay-off is that batching takes extra time when the game is initialized
The change in triangles therefore is because they have been all assembled into one mesh.... The only suggestion if this is necessary, is trying to get the mesh and altering points on it, but at that point I don't think it makes sense.
Perhaps try a different optimization method.
Good luck, haven't used JMonkey in a bit, but glad to see others do and its continued growth!
EDIT
BTW, a way to minimize the math might be to use half a sphere of cubes, an impact on the earth likely wouldn't affect the other side (unless the sphere isn't the earth but already a small sample of the earth taken as a sphere)...
Perhaps try a 2d shape as the impact surface, though I know this won't be your best choice, it might give you an idea of how the number of shapes might have an affect and how grand. If it does then an avenue might be to consider how to remove some of the particles, if it doesn't you need not worry. I am almost sure it will.
Finally:
Perhaps don't render in real time? Take a minute to draw the frames to a buffer then play, by the time your playing you will have another 40 or so frames etc... and maybe approx 30 secs worth is all you will need.
There is a pretty solid set of documentation within the JMonkeyEngine wiki which talks quite a bit about how to utilize the transformations you are referring to, which can be found here: Advanced Spatial Concepts.
In addition, there is quite a bit of information regarding the meshes and their rendering which you can view here: Polygon Meshes.
I am investigatin this field to obtain object detection in real time.
Video example:
http://www.youtube.com/watch?v=Bm5qUG-06V8
http://www.youtube.com/watch?v=aYd2kAN0Y20
But how can they extract sift keypoint and matching them so fast?
SIFT extraction requires a second generally
I'm an OpenIMAJ developer and responsible for making the first video.
We're not doing anything particularly fancy to make the matching fast in that video, and the SIFT detection and extraction is carried out on the entirety of every frame. In fact that video was made well before we did any optimisation; the current version of that demo is much smoother. We do also have a version with a hybrid KLT-tracker that works even faster by not having to perform SIFT on every frame.
As suggested by #Mario, the image size does have a big effect on the speed of the extraction, so processing a smaller frame can give a big win. Secondly, in the original description of the difference of Gaussian interest point localisation suggested by Lowe in the SIFT paper, it was suggested that the input image was first doubled in size to increase the number of features. By not performing this double-sizing you also get a big performance boost at the expense of having fewer features to match.
The code is open source (BSD license) and you can get it by following the links at http://www.openimaj.org. As stated in the video description, the image-processing code is pure Java; the only native code is a thin interface to the webcam. Tutorial number 7 in the current tutorial pdf document walks through the process of using SIFT in OpenIMAJ. Disabling the double-sizing can be achieved by doing:
DoGSIFTEngine engine = new DoGSIFTEngine();
engine.getOptions().setDoubleInitialImage(false);
SIFT can be accelerated in several ways :
if you can afford approximations, then you can derive a keypoint called SURF which is way faster (using integral images for most tasks)
you can use parallel implementations, at the CPU level (e.g. OpenCV uses Intel's TBB) or at the GPU level (google for sift gpu for related code and doc).
Anyway, none of these is available (AFAIK) in Java, so you'll have to use a Java wrapper to opencv or work it out yourself.
General and first idea: Ask the video uploader(s). We can just assume what's done or how it's done. It might also help to know what you've done so far (e.g. your video resolution, your processing power, image preparation, etc.).
I haven't used SIFT specifically, but I did quite some object/motion tracking during the last few years, so this is more in general. You might have tried some points already, I don't know.
Reduce your image resolution: Going from 640x480 to 320x240 will reduce your data to 25%. Going down to 160x120 will cut it by another 25% (so 6.25 % data left) without significantly impacting your algorithm.
In a similar way, it might be useful to reduce the color depth of your image (not just 256 grayscale, but maybe even more; like 64 colors).
Try other methods to make features more obvious or faster to find, e.g. try running an edge detector over your image.
At least the second video mentions a tracking system, so you could try to guess the region where the object tracked should reappear the next frame (using some simple a/b filter or whatever on coordinates and possibly rotation), then use SIFT on that sub area (with some added margin) only. Only analyze the whole image if you can't find it again. At around 40 or 50 seconds in the second video they're losing the object and need quite some time/tries to find it again.
I saw this video, and I am really curious how it was performed. Does anyone have any ideas? My intuition is that he scraped pixels from the screen (one per 'box'), and then fed that into some program to determine the next move.
Is scraping pixel-by-pixel the way to do this, or is there a better way? I am looking to do something similar with either Java or Python.
Thanks
Probably that's the most reliable way. There are ways to inspect what is happening inside a process - looking directly at its internal state and memory - but they are platform-specific and very prone to misbehaving because your dealing with a slightly different version of something - that includes a different flash version as well as a different version of the app. Those methods are more often used for "trainers" for exe games, where there's typically only one or two versions of the executable to worry about.
Lots of screen shots, comparing, figuring out reliable indicator pixels seems the way to go - plus keeping track of what you expect to happen, of course. When the app is running, it should work from a screenshot at a time (hopefully ensuring a consistent picture, with no half-updated views) and then test the minimum number of pixels needed using (perhaps) a decision tree.
There are ways to automate construction of efficient decision trees, but it's probably easier to do it manually based on comparing screen shots. In this case, since Tetris normally creates all new pieces at the same position, with a 1:1 relationship between colour and shape, you can probably determine the shape and position of a new piece from a single pixel colour - so "decision tree" is probably the wrong term, really, in this case - though there are other things the bot needs to read from the screen.
What's more interesting is the logic to actually make gameplay decisions, since that bot clearly isn't just slotting every piece into the most immediately obvious position, but deliberately aiming to create opportunities to clear 3 or 4 rows at a time.
Yes, i think he scanned the pixels. Actually it should be very simple because you only need to scan the new shape for each move. With that information you can locally calculate the grid and further use it for your AI calculations.