I am investigatin this field to obtain object detection in real time.
Video example:
http://www.youtube.com/watch?v=Bm5qUG-06V8
http://www.youtube.com/watch?v=aYd2kAN0Y20
But how can they extract sift keypoint and matching them so fast?
SIFT extraction requires a second generally
I'm an OpenIMAJ developer and responsible for making the first video.
We're not doing anything particularly fancy to make the matching fast in that video, and the SIFT detection and extraction is carried out on the entirety of every frame. In fact that video was made well before we did any optimisation; the current version of that demo is much smoother. We do also have a version with a hybrid KLT-tracker that works even faster by not having to perform SIFT on every frame.
As suggested by #Mario, the image size does have a big effect on the speed of the extraction, so processing a smaller frame can give a big win. Secondly, in the original description of the difference of Gaussian interest point localisation suggested by Lowe in the SIFT paper, it was suggested that the input image was first doubled in size to increase the number of features. By not performing this double-sizing you also get a big performance boost at the expense of having fewer features to match.
The code is open source (BSD license) and you can get it by following the links at http://www.openimaj.org. As stated in the video description, the image-processing code is pure Java; the only native code is a thin interface to the webcam. Tutorial number 7 in the current tutorial pdf document walks through the process of using SIFT in OpenIMAJ. Disabling the double-sizing can be achieved by doing:
DoGSIFTEngine engine = new DoGSIFTEngine();
engine.getOptions().setDoubleInitialImage(false);
SIFT can be accelerated in several ways :
if you can afford approximations, then you can derive a keypoint called SURF which is way faster (using integral images for most tasks)
you can use parallel implementations, at the CPU level (e.g. OpenCV uses Intel's TBB) or at the GPU level (google for sift gpu for related code and doc).
Anyway, none of these is available (AFAIK) in Java, so you'll have to use a Java wrapper to opencv or work it out yourself.
General and first idea: Ask the video uploader(s). We can just assume what's done or how it's done. It might also help to know what you've done so far (e.g. your video resolution, your processing power, image preparation, etc.).
I haven't used SIFT specifically, but I did quite some object/motion tracking during the last few years, so this is more in general. You might have tried some points already, I don't know.
Reduce your image resolution: Going from 640x480 to 320x240 will reduce your data to 25%. Going down to 160x120 will cut it by another 25% (so 6.25 % data left) without significantly impacting your algorithm.
In a similar way, it might be useful to reduce the color depth of your image (not just 256 grayscale, but maybe even more; like 64 colors).
Try other methods to make features more obvious or faster to find, e.g. try running an edge detector over your image.
At least the second video mentions a tracking system, so you could try to guess the region where the object tracked should reappear the next frame (using some simple a/b filter or whatever on coordinates and possibly rotation), then use SIFT on that sub area (with some added margin) only. Only analyze the whole image if you can't find it again. At around 40 or 50 seconds in the second video they're losing the object and need quite some time/tries to find it again.
Related
I'm busy coding Conways Game of Life and I'm trying to optimise it using some data structure that records which cells should be checked on each life cycle.
I'm using an arrayList as a dynamic data structure to keep a record of all living cells and their neighbours. Is there a better data structure or way of keeping a shorter list that will increase the games speed?
I ask this because often many cells are checked but not changed so I feel like my implementation could be improved.
I believe that the Hashlife algorithm could help you.
It gives the idea of using a quadtree (tree data structure in which each internal node has exactly four children) to keep the data, and then it uses hash tables to store the nodes of the quadtree.
For further reading, this post, written by Eric Burnett, gives a great insight about how Hashlife works, it's performance and implementation (although in Python). It's worth a read.
I built a Life engine that operated on 256x512 bit grids directly mapped to screen pixels back in the 1970s, using a 2Mhz 6800 8 bit computer. I did it directly on the display pixels (they were one-bit on/off white/black) because I wanted to see the results and didn't see the point in copying the Life image to the display.
Its fundamental trick was to treat the problem as one of evaluating a Boolean logic formula for "this cell is on" based on rules of Life, rather than counting live neighbors as is usual. This formula is pretty easy to figure out, so left as a homework exercise. What made it fast was that the Boolean formula was computed on a per-bit basis, doing 8 bits at a time. If you sweep down the screen and across rows, you can in essence evaluate N bits at once (8 on the 6800, 64 on modern PCs) with very low overhead. If you go nuts, you can likely use the SIMD vector extensions and do 256 bits or more at "once". Over the top would be doing this with a GPU.
The 6800 version would process a complete screen in about .5 second; you could watch the update ripple down the screen from top to bottom (60 Hz refresh). On a modern CPU with 1000x the clock rate (1 GHz is pretty easy to get) and 64 bits at a time, it should be able to produce thousands of frames per second. So fast you can't watch it run :-{
A useful observation is that much of the Life world is dead (blank) and processing that part mostly produces more dead cells. This suggests using a sparse representation. Another poster suggested quadtrees, which I think is a very good suggestion. Your quadtree regions don't have to be square, either.
Combining the two ideas, quadtrees for non-blank regions with bit-level processing for blocks of bits designated by the quadtrees is likely to give an astonishingly fast Life algorithm.
Edit: For having real-time drawing, started using lwjgl which is base of jmonkeyengine and jocl in an "interoperability" between opengl and opencl, now can calculate and draw 100k particles real-time. Maybe mantle version of jmonkey engine can cure this drawcall overhead problem.
For several days, I have been learning jMonkey engine(ver:3.0) in Eclipse(java 64 bit) and trying how to optimize a scene with using GeometryBatchFactory.optimize(rootNode); command.
Without optimization(with capability of changing spheres positions):
Okay, only 1-fps is originated from both pci-express bandwidth+jvm overhead.
With optimization(without capability to change positions of spheres):
Now it is 29 fps even with increased triangle number.
Java3D had a setCapability() method which makes a scene object be able to be read/written even in an optimized form. jMonkey engine 3.0 must be capable of this subject but I couldn't find any trace of it(searched tutorials and examples, failed).
Question: How can I set read/write position/rotation/scale capabilities of optimized nodes of a scene in jMonkey 3.0? If you cannot give an answer to first question, can you tell me why triangle numbers increase when I use optimization command? Do I have to create a new method to access the graphics card and change the variables myself(jogl maybe?)?
Scene information: 16k particles(spheres of 16x16 res) + 1 point light(and its 4096 resolutioned shadow).
I'm sure we can send several thousands of float numbers in a millisecond through pci-express with ease.
Additional info: I'm using Aparapi-kernels to update particle
positions which takes 10 milliseconds(16k * 16k interactions to
calculate forces).(does not change anything in optimized mode :( )
Can aparapi access those optimized data?
For the case of batchNode.batch(); optimization, here is 1 fps again with lessened object-numbers:
Object number is now only several hundreds but fps is still at 1!
Sending just sphere positions to gpu and letting it calculate the vertex positions could be better than calculating vertexes on cpu plus sending huge data to gpu.
No-one here to help? Already tried batchNode but did not help enough.
I dont want to change 3d api because jMonkey people already reinvented the wheel and I'm happy with current situation. Just trying to squeeze a little more performance(canceling shadows gives %100 speed but quality is important too!).
This java program will become an asteroid-impact scene simulator(there will be choice of asteroid size,mass,speed,angle) with marching-cubes algorithm with LOD(will be millions of particles).
Marching-cubes algorithm would decrease the triangle numbers greatly. If you couldnt give any answer the question, any marching-cubes(or any O(n) convex hull) algorithm for java will be accepted! Data: x,y,z arrays as source and triangle-strip-array as target(iso-surface mesh points)
Thanks.
Here are some samples about the stream(with a much lower resolution):
1)Collapsing of a cube-shaped rock-group by gravitation:
2)Exclusion force starts to show itself:
3)Exclusion force + gravitation makes the group form a more smooth shape:
4)Group forms a sphere(as expected):
5)Then, a big stellar body approaches:
6)About to touch:
7)The moment of impact:
With help of Barnes-Hutt algorithm and a truncated potential, particle numbers will be 10x(maybe 100x) more.
Rather than Marching-Cubes algorithm, a ghost cloth which wraps the nbody can give a low-resolutioned hull(more easier than BH but need more computation)
Ghost cloth will be affected by nbody(gravity + exclusion) but nbody will not be affected by cloth which wraps it. Nbody wont be rendered but cloth mesh will be rendered with lower triange count.
If MC or above works, this will let the program render a wrapping-cloth for ~200x more particles.
So sorry....
You can batch all Geometries in a scene (or a subnode) that remains static.
Batching means that all Geometries with the same Material are combined into one mesh. This optimization only has an effect if you use only few (roughly up to 32) Materials total. The pay-off is that batching takes extra time when the game is initialized
The change in triangles therefore is because they have been all assembled into one mesh.... The only suggestion if this is necessary, is trying to get the mesh and altering points on it, but at that point I don't think it makes sense.
Perhaps try a different optimization method.
Good luck, haven't used JMonkey in a bit, but glad to see others do and its continued growth!
EDIT
BTW, a way to minimize the math might be to use half a sphere of cubes, an impact on the earth likely wouldn't affect the other side (unless the sphere isn't the earth but already a small sample of the earth taken as a sphere)...
Perhaps try a 2d shape as the impact surface, though I know this won't be your best choice, it might give you an idea of how the number of shapes might have an affect and how grand. If it does then an avenue might be to consider how to remove some of the particles, if it doesn't you need not worry. I am almost sure it will.
Finally:
Perhaps don't render in real time? Take a minute to draw the frames to a buffer then play, by the time your playing you will have another 40 or so frames etc... and maybe approx 30 secs worth is all you will need.
There is a pretty solid set of documentation within the JMonkeyEngine wiki which talks quite a bit about how to utilize the transformations you are referring to, which can be found here: Advanced Spatial Concepts.
In addition, there is quite a bit of information regarding the meshes and their rendering which you can view here: Polygon Meshes.
How can I take two images and compare them to see how similar they are?
I'm not talking about comparing two exact images using MD5. The two images that I am comparing will be completely different, as well as likely different sizes at times.
Using Pokemon cards as an example:
I'm going to have scanned HD images of each of the cards. I want the user to be able to take a picture of their Pokemon card with their phone and I want to be able to compare it against my scanned images and then determine which card it is that they took a picture of.
The processing does not have to be done directly on the phone, offloading to a web service is an option however note that my knowledge somewhat limited on the programming languages (limited to PHP/JAVA/Android pretty much). The server I'm using is my own Ubuntu server so I do have access to the exec command from php if this would help.
At first I figured someone would have done something like this before (comparing two images). I tried using php with imageik using an example I found that claimed to do what I was trying ( utilizing compareImages() ), but it didn't work at all. There doesn't seem to be much (if any) documentation on doing something like this which is why I'm so stuck. All I'm looking for is a push in the right direction.
My second thought was to try using OCR to pull just the title of the card and I would just compare that against a database of titles and display the images tied to that title. So far I've tried using phpocr first, which didnt work at all as it requires monochrome images to my understanding. Next I tried tesseract directly from the console on my server, and while it did WAY better than phpocr, more than 80% of the characters were either wrong or incorrect on a scanned image, so a lower quality image coming from a smart phone would really have troubles.
I also tried OpenCV for Android but couldnt get any of the samples working.
Has anyone done anything like this, or at least used something that can accomplish what Im looking for?
There are two distinct tasks - identify area of interest ( which can be done with Haar cascades - same as face detection ) and recognition of identified image which can be
done with invariant moment techniques (like Hu moments - it was good enough to count soviet tanks on satellite images so it shall be good for pokemons). Nice property of invariant moments is soft degradation of results in case of low quality - you get a list of probability for symbols - like this is 80% pikachu and 30% something else.
We are developing OCR library based on invariant moments for use in android here:
https://sourceforge.net/projects/javaocr/
(
pure java and reasonable speed , and there are android samples in demos subdirectory.
And here is app based on javaocr, it will recognize black on white phone number and dial it: https://play.google.com/store/apps/details?id=de.pribluda.android.ocrcall&feature=search_result#?t=W251bGwsMSwyLDEsImRlLnByaWJsdWRhLmFuZHJvaWQub2NyY2FsbCJd
)
You may also consider some aiming help so user positions symbol to be matched properly
( so first task will use real intellect )
You should decide what kind of similarity comparison you need. There are geometric algorithms. They use edge detection and then try to match detected edges in both images. They are probably useful when dealing with different colours of objects with the same shape. And there are algorithms that are more based on colour similarity. They compare what colours are in the image and how they are distributed.
If you are looking for a concrete algorithm, you probably should have a look at the Hough Transform.
i was wondering if anyone has knowledge on the recontruction of 3D objects from live video feed. Does any have any java based examples or papers JAVA based that i could be linked to as i have read up on algorithm's used to produce such 3d objects. If possible i would like to construct something such as the program demostrated in the link provided below.
Currently my program logs live video feed.
http://www.youtube.com/watch?v=brkHE517vpo&feature=related
3D reconstruction of an object from a single point of view is not really possible. You have two basic alternatives: a) To have a stereo camera system capturing the object, b) To have only one camera, but rotating the object (so you will have different points of view of the object), like the one in the video. This is a basic concept related with epipolar geometry.
There are other alternatives, but more intrusive. Some time ago I've been working on a 3D scanner based on a single camera and a laser beam.
For this, I used OpenCV which is C++ code, but now I think there are ports for Java. Have in mind that 3D reconstruction is not an easy task, and the resulting app. will have to be largely parametrized to achieve good results.
This isn't a solved problem - certain techniques can do it to a certain degree under the right conditions. For example, the linked video shows a fairly simple flat-faced object being analysed while moving slowly under relatively even lighting conditions.
The effectiveness of such techniques can also be considerably improved if you can get a second (stereo vision) video feed.
But you are unlikely to get it to work for general video feeds. Problem such as uneven lighting, objects moving in front of the camera, fast motion, focus issues etc. make the problem extremely hard to solve. The best you can probably hope for is a partial reconstruction which can then be reviewed and manually edited to correct the inevitable mistakes.
JavaCV and related projects are probably the best resource if you want to explore further. But don't get your hopes too high for a magic out-of-the-box solution!
I've been using Incanter for my graphing needs, which was adequate but slow for my previous needs.
Now I need to embed a graph in a JPanel. Users will need to interact with the graph (e.g. clicking on certain points which the program would need to receive and deal with) by dragging and clicking. Zooming in a out is a must as well.
I've heard about JFreeChart on other SO discussions, but I see that Incanter uses that as it's graphing engine, and it seemed somewhat slow then. It it actually fast, but perhaps Incanter is doing things that slow it down?
I'm graphing up to 2 million points (simple xy-plots, really), though generally will be graphing less. Using Matlab, this is plotted in a few seconds, but Incanter can hang for minutes.
So is JFreeChart the way to go? Or something else, given my needs?
(Also, it needs to be free, as it is for research.)
Unfortunately, general purpose graphing solutions probably aren't going to scale well to 2 million points - that's big enough that you will need something specialized if you want interactive performance.
I can see a few sensible options:
Write your own custom "plotter" function that is optimized for drawing large numbers of points. You'd have to test it, but I think you might just about get the performance you want by writing the points directly to a BufferedImage using setRGB in a tight loop. If that still isn't fast enough, you can write the points directly into a byte array and construct a MemoryImageSource.
Exclude points so that you are only drawing e.g. 10,000 points maximum. This may be perfectly acceptable as long as you only really care about the overall shape of the scatter plot rather than individual points.
Pre-render all the points into e.g. a large BufferedImage then allow users to zoom in and out / interact with this static image. You might be able to "hack" JFreeChart to do this.
If OpenGL is an option (will require native code + getting up a steep learning curve!), then drop all the points in a big vertex array and get the graphics card to do it all for you. Will handle 2 million points in real-time without any difficulty.
MathGL is fast and free (GPL) plotting library. But I never test its java interface (swig based) since I'm not familiar with java :( . So, if one can help with testing then I'll be thankful.