For a small project we're trying to implement an autopilot for a slot car. A gyro sensor is attached to the car and delivers the Z-value (meaning the amount of centrifugal force acting on the car/sensor) 20 times per second. One crucial part of this is the detection of whether or not the car is in a curve or on a straight part and when exactly it was entered and when it left that part. Only so we can have reliable prediction of what'll happen in the future.
As for now, we're working with a sliding window to smooth the data and then have hardcoded limits (-400 for a left curve and +400 for a right curve) to detect what kind of sector (left, right, straight) we're in.
Obviously this takes too long, as it takes a couple of messages until the program detects that it's a direction change because of the smoothing and the hardcoded limits.
Here's an example of two rounds on a simple track, starting at the checkered area:
A perfect algorithm would detect the sectors S R S R S L S R S R S R S for one round, with a delay of only a couple of data points.
We thought about using the first derivative of the gyro values, but in the sample graph right after the first left curve, the following right curve (between 22:36:40 and 22:36:42) shows signs of swerving. Here the first derivative would be close to 0 and indicate a straight part...
Also, there we'd need to set a hardcoded threshold again, but with the noise of the data it could be that a small bump in the track could result in such a noise level that it's derivative would exceed the threshold.
Now we're not sure about what would be the easiest/fastest/most reliable way to handle this sort of detection. Would using a derivative be a good idea? Is there a better way?
Any input would be greatly appreciated :)
The existing software is written in Java.
In such problems, you have to trade robustness for immediacy. If you don't know what happens in the future, you can only make assumptions. And these assumptions may hold or may not.
From the looks of your data, there shouldn't be any smoothing necessary. If you define a reasonable threshold, the curves should be recognized quite reliably. If, however, this is not the case, here are some things you could try:
You already mentioned smoothing. The crucial point is how you smooth. An asymmetric smoothing kernel is probably desirable (a half triangle filter can be updated in constant time). You can directly weigh robustness and immediacy by modifying the kernel width.
A simple alternative to filtering is counting. If your data is above the curve threshold, don't call it a curve just yet. Count how many data points are above the threshold in a row. If there are more than n data points above the threshold, then you're most likely in a curve.
Using derivatives is potentially problematic. The main reason against derivatives is that a curve is not defined by any derivative at all (at least no derivative of the force). The second problem is that you can only estimate the derivatives numerically, which is quite unstable with lots of noise. So you would have to smooth your data (or find a numerical scheme for your noise model), which again requires some latency.
Related
Given a set of X/Y co-ordinates ([(x,y)] with increasing X(representing a timestamp) and Y representing a value/measurement at that timestamp.
This set can possibly be huge and i would like to avoid returning every single point in the set for display but rather find a smaller subset that would represent the overall trend of the measurement(some level of accuracy loss in the line graph will be acceptable).
So far, i tried the simple uniform sampling of measurement skipping points at uniform interval, then adding the max/min measurement value to the subset. While this is simple, It doesn't really account well for local peaks or valleys if the measurement fluctuates often.
I'm wondering if there are any standard algorithms that deal with solving this type of problems on server side?
Appreciate if anyone has solved it or know of any util/common libraries solving such problems. I'm on Java, but if there is any reference to standard algorithms i might try to implement one in Java.
It's hard to give a general answer to this question. It all depends on how your datapoints are stored, what properties your chart has, how it is rendered etc.
But as #dmuir suggested, you should check out the Douglas-Peucker algorithm. Another approach I just thought up could be to split the input data into chunks of some size (maybe corresponding to a single horizontal pixel) and then using some statistic (min, max, or average) for rendering chunk. If you use running statistics when adding data points to a chunk, this should be O(n), so it's not more expensive than the reading on of your data points.
What I seek is to turn a grid into a somewhat "random" plane of tiles.
I tried just multiplying Math.random() individually with the width and height of the plane (in this case its 800 / 600). The circles you see there are points that intersect each other and have been removed from the scene.
As you can see, it looks very far from an "evenly distributed" field of points. There are large holes and just as bad, clusters of points can be seen.
What I am looking for is a way to distribute these points better to have a minimum amount of clusters and holes. Ideally, to have a value that is the minimum distance between any two points, while having the maximum number of points that can fit in the area. I am fine with approximations of all kinds, I just don't want to attempt to do a greedy distribution.
Whatever ecma solution you give its fine, I can convert it to Actionscript.
I have found a visual example. The left side is what I got and the right is what I aim for.
You can try Loyds algorithm, i.e. centroidal weighted voronoi diagrams. Compute the vd and then the center of gravity of each cell. Replace the old points and rinse and repeat: http://www-cs-students.stanford.edu/~amitp/game-programming/polygon-map-generation/.
In general, it is a non-trivial problem, and there are many different approaches.
One that I have liked, since it is fast and produces decent results, is the quasi-random number generator from this article: "The Unreasonable Effectiveness of Quasirandom Sequences"
Other approaches are generally iterative, where the more iterations you do, the better results. You could look up "Mitchell's Best Candidate", for one. Another is "Poisson Disc Sampling".
There are innumerable variations on the different algorithms depending on what you want — some applications demand certain frequencies of noise, for instance. But if you just want something that "looks okay", I think the quasirandom one is a good starting point.
Another cheap and easy one is a "jittered grid", where you evenly space the points on your plane, then randomly adjust each one a small amount.
I am trying to make a music visualizer in Processing, not that that part is super important, and I'm using a fast fourier transform through Minim. It's working perfectly (reading the data), but there is a large spike on the left (bass) end. What's the best way to 'level' this out?
My source code is here, if you want to take a look.
Thanks in advance,
-tlf
The spectrum you show looks fairly typical of a complex musical sound where you have a complex section at lower frequencies, but also some clear harmonics emerging from the low frequency mess. And, actually, these harmonics are atypically clear... music in general is complicated. Sometimes, for example, if a flute is playing a single clear note one will get a single nice peak or two, but it's much more common that transients and percussive sounds lead to a very complicated spectrum, especially at low frequencies.
For comparing directly to the video, it seems to me that the video is a bit odd. My guess is that the spectrum they show is either a zoom in a small section of the spectrum far from zero, or that it's just a graphical algorithm that's based off the music but doesn't correspond to an actual spectrum. That is, if you really want something to look very similar to this video, you'll need more than the spectrum, though the spectrum will likely be a good starting point. Here are a few points to note:
1) there is a prominent peak which occasionally appears right above the "N" in the word anchor. A single dominant peak should be clear in the audio as an approximately pure tone.
2) occasionally there's another peak that varies temporally with this peak, which would normally be a sign that the second peak is a harmonic, but many times this second peak isn't there.
3) A good examples of odd behavior, is a 2:26. This time just follows a little laser sound effect, and then there's basically a quite hiss. A hiss should be a broad spectrum sound without peaks, often weighted to lower frequencies. At 2:26, though, there's just this single large peak above the "N" with nothing else.
It turns out what I had to do was multiply the data by
Math.log(i + 2) / 3
where i is the index of the data being referenced, zero-indexed from the left (bass).
You can see this in context here
I wrote an object tracker that will try to detect and follow a moving object in a recorded video. In order to maximize the detection rate, my algorithm is using a bunch of detection & tracking algorithms (cascade, foreground & particle tracker). Each tracking algorithm will return me some point of interest that might be part of the object that I'm trying to track. Let's assume (for the simplicity of this example) that my object is a rectangle and that the three tracking algorithms returned the points 1, 2 and 3:
Based on the relation / distance of these three points it is possible to calculate the center of gravity (blue X in above image) of the tracked object. So for each frame I might be able to come up with some good estimate of the center of gravity. However, the object might move from one frame to the next:
In this example I merely rotated the original object. My algorithm will give me three new points of interest: 1',2' and 3'. I could again calculate the center of gravity based on these three new points, but I would throw away important information that I've acquired from the previous frame: based on points 1, 2 and 3 I already do know something about the relationship of these points and thus by combining the information from 1, 2 and 3 and 1',2' and 3' I should be able to come up with a better estimate of the center of gravity.
Furthermore, the next frame might yield a forth data point:
This is what I would like to do (but I don't know how):
based on the individual points (and their relationship to each other) that are returned from the different tracking algorithms, I want to build up a localization map of the tracked object. Intuitively I feel like I need to come up with A) an identification function that will identify individual points across frames and B) some cost function that will determine how similar tracked points (and the relationship / distance between them) are from frame to frame, but I can't get my head around on how to implement this. Alternatively, maybe some kind of map buildup based on the points will work. But again, I don't know how to approach this.
Any advice (and example code) is highly appreciated!
EDIT1
a simple particle filter might probably work too, but I again don't know how to define the cost function. A particle filter for tracking a certain color is easy to program: for each pixel you calculate the difference between target color and pixel color. But how would I do the same for estimating the relationship between tracked points?
EDIT2 intuitively I feel like Kalman filters could also help with the prediction step. See slides 24 - 32 of this pdf. Or am I misled?
What I think you're trying to do is essentially build up a state space of features, which can be applied to a filtering process, such as an Extended Kalman Filter. This is a useful framework when you have multiple observations in every frame, and you're trying to estimate or measure something indicated by these observations.
To determine the similarity of the tracked points, you can perform simple template matching from frame to frame for small regions around the points. One way of doing this is to extract an NxN (say, 7x7) region around point a in frame n and point a' in frame n+1, followed by normalised cross correlation between the extracted regions. This will give you a reasonable measure of how similar the patches are. If the patches are not similar, then you've probably lost track of that point.
There is an enormous literature on this and related problems starting in the 80's. Try searching for "optical flow" algorithms". The input for such algorithms is two successive frames of the same scene. The output is a vector field, one vector per pixel in the second image, which shows what the direction and speed of movement of the feature in that field. This presentation is a pretty nice summary.
A nice thing about optical flow is that many algorithms for it parallelize nicely and map onto your favorite video card GPU, so they can run in real time. Think ESPN overlays.
According to me, in order to identify who is who in each frame, you will have to use a greater dimension. For example if you want to know which point is where between two frame (considering your extracted point are same), you will have to build vectors or simplex and then deduce an organisation between your points (like angles values).
The main problem is that combinations increase with point number. If your camera is a fixed point then, you could use background as a reference in order to deduce object rotations and translations, i mean build vectors between background interest points and object points in order to clearly identify them.
hope that help go forward.
I would recommend looking in to the divided difference filter (DDF), which is similar to the extended Kalman filter (EKF), but does not require an approximate model of the dynamics of your system (which you may not have). Basically the DDF approximates the derivatives used in the EKF using a difference equation. There are plenty of papers online about this, but I do not know whether you have access to them so I have not linked them here. If you are working from a university or a company that has access to online journals (like IEEE Explore), then just Google "divided difference filter" and check out some of the papers.
While I have many questions on this site dealing with the concept of pitch detection... They all deal with this magical FFT with which I am not familiar. I am trying to build an Android application that needs to implement pitch detection. I have absolutely no understanding for the algorithms that are used to do this.
It can't be that hard can it? There are around 8 billion guitar tuner apps on the android market after all.
Can someone help?
The FFT is not really the best way to implement pitch detection or pitch tracking. One issue is that the loudest frequency is not always the fundamental frequency. Another is that the FFT, by itself, requires a pretty large amount of data and processing to obtain the resolution you need to tune an instrument, so it can appear slow to respond (i.e. latency). Yet another issue is that the result of an FFT is necessarily intuitive to work with: you get an array of complex numbers and you have to know how to interpret them.
If you really want to use an FFT, here is one approach:
Low-pass your signal. This will help prevent noise and higher harmonics from creating spurious results. Conceivably, you could do skip this step and instead weight your results towards the lower values of the FFT instead. For some instruments with strong fundamental frequencies, this might not be necessary.
Window your signal. Windows should be at lest 4096 in size. Larger is better to a point because it gives you better frequency resolution. If you go too large, it will end up increasing your computation time and latency. The hann function is a good choice for your window. http://en.wikipedia.org/wiki/Hann_function
FFT the windowed signal as often as you can. Even overlapping windows are good.
The results of the FFT are complex numbers. Find the magnitude of each complex number using sqrt( real^2 + imag^2 ). The index in the FFT array with the largest magnitude is the index with your peak frequency.
You may want to average multiple FFTs for more consistent results.
How do you calculate the frequency from the index? Well, let's say you've got a window of size N. After you FFT, you will have N complex numbers. If your peak is the nth one, and your sample rate is 44100, then your peak frequency will be near (44100/2)*n/N. Why near? well you have an error of (44100/2)*1/N. For a bin size of 4096, this is about 5.3 Hz -- easily audible at A440. You can improve on that by 1. taking phase into account (I've only described how to take magnitude into account), 2. using larger windows (which will increase latency and processing requirements as the FFT is an N Log N algorithm), or 3. use a better algorithm like YIN http://www.ircam.fr/pcm/cheveign/pss/2002_JASA_YIN.pdf
You can skip the windowing step and just break the audio into discrete chunks of however many samples you want to analyze. This is equivalent to using a square window, which works, but you may get more noise in your results.
BTW: Many of those tuner apps license code form third parties, such as z-plane, and iZotope.
Update: If you want C source code and a full tutorial for the FFT method, I've written one. The code compiles and runs on Mac OS X, and should be convertible to other platforms pretty easily. It's not designed to be the best, but it is designed to be easy to understand.
A Fast Fourier Transform changes a function from time domain to frequency domain. So instead of f(t) where f is the signal that you are getting from the microphone and t is the time index of that signal, you get g(θ) where g is the FFT of f and θ is the frequency. Once you have g(θ), you just need to find which θ with the highest amplitude, meaning the "loudest" frequency. That will be the primary pitch of the sound that you are picking up.
As for actually implementing the FFT, if you google "fast fourier transform sample code", you'll get a bunch of examples.