I have recently been working on a program which simulates gravitational interactions between planets. The physics have been implemented, and the program is fully functional.
However, the next feature to be added is the tracing of the planet's path, and I cannot think of a good way to do this.
The first way I thought of was to use particle effects, however that would require an inordinate amount of particles, and I believe that would bring very low performance.
The other way is by drawing a line between each position a planet has been through. However, this results in the number of lines drawn to rise to unreasonable high number, and thus performance greatly suffers, and in some cases, even by 1000%. Even when limiting the number of lines to a relatively low number of 300 per orbit, performance is lowered dramatically.
Thank you for your help.
Related
Before adding the detune variable, the physics update on a computer and a smartphone was strikingly different.
After adding and multiplying some variables, it turned out to smooth out the difference, but not make it completely the same.
In this regard, I ask for help, because I cannot figure out what to do myself.
public void update(float dt, Camera cam){
float detune=dt/0.01666f;
if(!ignoreGraviry)
attraction.add(getGravity().cpy().scl(detune));
float lx=1-.09f*detune;
float ly=1-.015f*detune;
attraction.scl(lx,ly);
Vector v=getMotion().scl(lx,ly).cpy();
lastPos=getPosition().cpy();
getPosition().add(
v.rotate(cam.getRotation()).add(
attraction.cpy().rotate(cam.getRotation())
).scl(dt));
}
Problem
What detune does is to scale up simulation cycle effects by a factor of 60. So basically, instead of having to simulate 60 cycles, this only has to simulate 1 cycle. But the results will be more inaccurate, maybe only a bit, maybe a lot, depending on if the rest of the simulation outside is stable/converging or not. Also with lx and ly, the way this detune is done LOOKS awfully bad (it MIGHT be OK with some outside knowlegde that your question does not provide), because you should never combine linear scaling effects with addition. This will throw you into hell's pit faster than you can imagine. lx for example will take negative numbers or positive, depending on dt. dt usually is the 'delta time' and lets you adjust the granularity vs speed of the simulation. So if someone adjusts dt and all of a sudden the simulation runs backwards, this will become a real sore issue.
Solution
You should NOT have detune in your code like this. Better increase the dt value. Ensure that calculation cycles have the same temporal distance on PC and Smartphones, like 30 times a second (30 fps, dt=33ms) and sleep the rest of the time. If you cannot guarantee that, simulation results will always differ between them, bringing advantages or disadvantages to either.
I do not know if libgdx has a fixed simulation-graphics-cycle, so exactly one simulation per graphics update. But in most engines (yes, especially games, that's why multithread/-core is usually useless there) they are heavily coupled, which - in modern programming languages - is really bad because then you'd have to restrict your simulation algorithm AND graphics updates to the lowest hardware for BOTH PCs AND phones, AND restricting them to both the worst graphical AND computational minimum requirements.
If you can decouple simulation and graphics, you'd only have to consider the lowest computational capabilities. Concerning graphics, you could always run the max frame rate on each system (or limit to 90fps, only very few people have a higher acuity), making the best of the graphics hardware, getting the smoothest rendering.
All I know is that delta relates somehow to adapting to different frame rates, but I'm not sure exactly what it stands for and how to use it in the math that calculates speeds and what not.
Where is delta declared? initialized?
How is it used? How are its values (min,max) set?
It's the number of milliseconds between frames. Rather than trying to build your game on a fixed number of milliseconds between frames, you want to alter your game to move/update/adjust each element/sprite/AI based on how much time has passed since the last time the update method has come around. This is the case with pretty much all game engines, and allows you to avoid needing to change your game logic based on the power of the hardware on which you're running.
Slick also has mechanisms for setting the minimum update times, so you have a way to guarantee that the delta won't be smaller than a certain amount. This allows your game to basically say, "Don't update more often than every 'x' milliseconds," because if you're running on powerful hardware, and have a very tight game loop, it's theoretically possible to get sub-millisecond deltas which starts to produce strange side effects, such as slow movement, or collision detection that doesn't seem to work the way you expect.
Setting a minimum update time also allows you to minimize recalculating unnecessarily, when only a very, very small amount of time has passed.
Have a read of the LWJGL timing tutorial found here. Its not strictly slick but will explain what the delta value is and how to use it.
I need some help confirming some basic DSP steps. I'm in the process of implementing some smartphone accelerometer sensor signal processing software, but I've not worked in DSP before.
My program collects accelerometer data in real time at 32 Hz. The output should be the principal frequencies of the signal.
My specific questions are:
From the real-time stream, I am collecting a 256-sample window with 50% overlap, as I've read in the literature. That is, I add in 128 samples at a time to fill up a 256-sample window. Is this a correct approach?
The first figure below shows one such 256-sample window. The second figure shows the sample window after I applied a Hann/Hamming window function. I've read that applying a window function is a typical approach, so I went ahead and did it. Should I be doing so?
The third window shows the power spectrum (?) from the output of an FFT library. I am really cobbling together bits and pieces I've read. Am I correct in understanding that the spectrum goes up to 1/2 the sampling rate (in this case 16 Hz, since my sampling rate is 32 Hz), and the value of each spectrum point is spectrum[i] = sqrt(real[i]^2 + imaginary[i]^2)? Is this right?
Assuming what I did in question 3 is correct, is my understanding right that the third figure shows principal frequencies of about 3.25 Hz and 8.25 Hz? I know from collecting the data that I was running at about 3 Hz, so the spike at 3.25 Hz seems right. So there must be some noise other other factors causing the (erroneous) spike at 8.25 Hz. Are there any filters or other methods I can use to smooth away this and other spikes? If not, is there a way to determine "real" spikes from erroneous spikes?
Making a decision on sample size and overlap is always a compromise between frequency accuracy and timeliness: the bigger the sample, the more FFT bins and hence absolute accuracy, but it takes longer. I'm guessing you want regular updates on the frequency you're detecting, and absolute accuracy is not too important: so a 256 sample FFT seems a pretty good choice. Having an overlap will give a higher resolution on the same data, but at the expense of processing: again, 50% seems fine.
Applying a window will stop frequency artifacts appearing due to the abrupt start and finish of the sample (you are effectively applying a square window if you do nothing). A Hamming window is fairly standard as it gives a good compromise between having sharp signals and low side-lobes: some windows will reject the side-lobes better (multiples of the detected frequency) but the detected signal will be spread over more bins, and others the opposite. On a small sample size with the amount of noise you have on your signal, I don't think it really matters much: you might as well stick with a Hamming window.
Exactly right: the power spectrum is the square-root of the sum of the squares of the complex values. Your assumption about the Nyquist frequency is true: your scale will go up to 16Hz. I assume you are using a real FFT algorithm, which is returning 128 complex values (an FFT will give 256 values back, but because you are giving it a real signal, half will be an exact mirror image of the other), so each bin is 16/128 Hz wide. It is also common to show the power spectrum on a log scale, but that's irrelevant if you're just peak detecting.
The 8Hz spike really is there: my guess is that a phone in a pocket of a moving person is more than a 1st order system, so you are going to have other frequency components, but should be able to detect the primary. You could filter it out, but that's pointless if you are taking an FFT: just ignore those bins if you are sure they are erroneous.
You seem to be getting on fine. The only suggestion I would make is to develop some longer time heuristics on the results: look at successive outputs and reject short-term detected signals. Look for a principal component and see if you can track it as it moves around.
To answer a few of your questions:
Yes, you should be applying a window function. The idea here is that when you start and stop sampling a real-world signal, what you're doing anyway is applying a sharp rectangular window. Hann and Hamming windows are much better at reducing frequencies you don't want, so this is a good approach.
Yes, the strongest frequencies are around 3 and 8 Hz. I don't think the 8 Hz spike is erroneous. With such a short data set you almost certainly can't control the exact frequencies your signal will have.
Some insight on question 4 (from staring at accelerometer signals of people running for months of my life):
Are you running this analysis on a single accelerometer axis channel or are you combining them to create the magnitude of acceleration? If you are interested in the overall magnitude of acceleration of signal, then you should combine x y z such as mag_acc = sqrt((x - 0g_offset)^2 + (y - 0g_offset)^2 + (z - 0g_offset)^2). This signal should be at 1g when the device is still. If you are only looking at a single axis, then you will get components from the dominant running motion and also from the orientation of the phone changing contributing to your signal (because the contribution from gravity will be transitioning around). So if the phone orientation is moving around while you are running from how you are holding it, it can contribute a significant amount to the signal, but the magnitude will not show the orientation changes as much. A person running should have a really clean dominant frequency at the persons step rate.
After watching the presentation "Performance Anxiety" of Joshua Bloch, I read the paper he suggested in the presentation "Evaluating the Accuracy of Java Profilers". Quoting the conclusion:
Our results are disturbing because they indicate that profiler incorrectness is pervasive—occurring in most of our seven benchmarks and in two production JVM—-and significant—all four of
the state-of-the-art profilers produce incorrect profiles. Incorrect
profiles can easily cause a performance analyst to spend time optimizing cold methods that will have minimal effect on performance.
We show that a proof-of-concept profiler that does not use yield
points for sampling does not suffer from the above problems
The conclusion of the paper is that we cannot really believe the result of profilers. But then, what is the alternative of using profilers. Should we go back and just use our feeling to do optimization?
UPDATE: A point that seems to be missed in the discussion is observer effect. Can we build a profiler that really 'observer effect'-free?
Oh, man, where to begin?
First, I'm amazed that this is news. Second, the problem is not that profilers are bad, it is that some profilers are bad.
The authors built one that, they feel, is good, just by avoiding some of the mistakes they found in the ones they evaluated.
Mistakes are common because of some persistent myths about performance profiling.
But let's be positive.
If one wants to find opportunities for speedup, it is really very simple:
Sampling should be uncorrelated with the state of the program.
That means happening at a truly random time, regardless of whether the program is in I/O (except for user input), or in GC, or in a tight CPU loop, or whatever.
Sampling should read the function call stack,
so as to determine which statements were "active" at the time of the sample.
The reason is that every call site (point at which a function is called) has a percentage cost equal to the fraction of time it is on the stack.
(Note: the paper is concerned entirely with self-time, ignoring the massive impact of avoidable function calls in large software. In fact, the reason behind the original gprof was to help find those calls.)
Reporting should show percent by line (not by function).
If a "hot" function is identified, one still has to hunt inside it for the "hot" lines of code accounting for the time. That information is in the samples! Why hide it?
An almost universal mistake (that the paper shares) is to be concerned too much with accuracy of measurement, and not enough with accuracy of location.
For example, here is an example of performance tuning
in which a series of performance problems were identified and fixed, resulting in a compounded speedup of 43 times.
It was not essential to know precisely the size of each problem before fixing it, but to know its location.
A phenomenon of performance tuning is that fixing one problem, by reducing the time, magnifies the percentages of remaining problems, so they are easier to find.
As long as any problem is found and fixed, progress is made toward the goal of finding and fixing all the problems.
It is not essential to fix them in decreasing size order, but it is essential to pinpoint them.
On the subject of statistical accuracy of measurement, if a call point is on the stack some percent of time F (like 20%), and N (like 100) random-time samples are taken, then the number of samples that show the call point is a binomial distribution, with mean = NF = 20, standard deviation = sqrt(NF(1-F)) = sqrt(16) = 4. So the percent of samples that show it will be 20% +/- 4%.
So is that accurate? Not really, but has the problem been found? Precisely.
In fact, the larger a problem is, in terms of percent, the fewer samples are needed to locate it. For example, if 3 samples are taken, and a call point shows up on 2 of them, it is highly likely to be very costly.
(Specifically, it follows a beta distribution. If you generate 4 uniform 0,1 random numbers, and sort them, the distribution of the 3rd one is the distribution of cost for that call point.
It's mean is (2+1)/(3+2) = 0.6, so that is the expected savings, given those samples.)
INSERTED: And the speedup factor you get is governed by another distribution, BetaPrime, and its average is 4. So if you take 3 samples, see a problem on 2 of them, and eliminate that problem, on average you will make the program four times faster.
It's high time we programmers blew the cobwebs out of our heads on the subject of profiling.
Disclaimer - the paper failed to reference my article: Dunlavey, “Performance tuning with instruction-level cost derived from call-stack sampling”, ACM SIGPLAN Notices 42, 8 (August, 2007), pp. 4-8.
If I read it correctly, the paper only talks about sample-based profiling. Many profilers also do instrumentation-based profiling. It's much slower and has some other problems, but it should not suffer from the biases the paper talks about.
The conclusion of the paper is that we
cannot really believe the result of
profilers. But then, what is the
alternative of using profilers.
No. The conclusion of the paper is that current profilers' measuring methods have specific defects. They propose a fix. The paper is quite recent. I'd expect profilers to implement this fix eventually. Until then, even a defective profiler is still much better than "feeling".
Unless you are building bleeding edge applications that need every CPU cycle then I have found that profilers are a good way to find the 10% slowest parts of your code. As a developer, I would argue that should be all you really care about in nearly all cases.
I have experience with http://www.dynatrace.com/en/ and I can tell you it is very good at finding the low hanging fruit.
Profilers are like any other tool and they have their quirks but I would trust them over a human any day to find the hot spots in your app to look at.
If you don't trust profilers, then you can go into paranoia mode by using aspect oriented programming, wrapping around every method in your application and then using a logger to log every method invocation.
Your application will really slow down, but at least you'll have a precise count of how many times each method is invoked. If you also want to see how long each method takes to execute, wrap around every method perf4j.
After dumping all these statistics to text files, use some tools to extract all necessary information and then visualize it. I'd guess this will give you a pretty good overview of how slow your application is in certain places.
Actually, you are better off profiling at the database level. Most enterprise databases come with the ability to show the top queries over a period of time. Start working on those queries until the top ones are down to 300 ms or less, and you will have made great progress. Profilers are useful for showing behavior of the heap and for identifying blocked threads, but I personally have never gotten much traction with the development teams on identifying hot methods or large objects.
I have been working on a childish little program: there are a bunch of little circles on the screen, of different colors and sizes. When a larger circle encounters a smaller circle it eats the smaller circle, and when a circle has eaten enough other circles it reproduces. It's kind of neat!
However, the way I have it implemented, the process of detecting nearby circles and checking them for edibility is done with a for loop that cycles through the entire living population of circles... which takes longer and longer as the population tends to spike into the 3000 before it starts to drop. The process doesn't slow my computer down, I can go off and play Dawn of War or whatever and there isn't any slow down: it's just the process of checking every circle to see if it has collided with every other circle...
So what occurred to me, is that I could try to separate the application window into four quadrants, and have the circles in the quadrants do their checks simultaneously, since they would have almost no chance of interfering with each other: or something to that effect!
My question, then, is: how does one make for loops that run side by side? In Java, say.
the problem you have here can actually be solved without threads.
What you need is a spatial data structure. a quad tree would be best, or if the field in which the spheres move is fixed (i assume it is) you could use a simple grid. Heres the idea.
Divide the display area into a square grid where each cell is at least as big as your biggest circle. for each cell keep a list (linked list is best) of all the circles whose center is in that cell. Then during the collision detection step go through each cell and check each circle in that cell against all the other circles in that cell and the surrounding cells.
technically you don't have to check all the cells around each one as some of them might have already been checked.
you can combine this technique with multithreading techniques to get even better performance.
Computers are usually single tasked, this means they can usually execute one instruction at a time per CPU or core.
However, as you have noticed, your operation system (and other programs) appear to run many tasks at the same time.
This is accomplished by splitting the work into processes, and each process can further implement concurrency by spawning threads. The operation system then switches between each process and thread very quickly to give the illusion of multitasking.
In your situation,your java program is a single process, and you would need to create 4 threads each running their own loop. It can get tricky, because threads need to synchronize their access to local variables, to prevent one thread editing a variable while another thread is trying to access it.
Because threading is a complex subject it would take far more explaining than I can do here.
However, you can read Suns excellent tutorial on Concurrency, which covers everything you need to know:
http://java.sun.com/docs/books/tutorial/essential/concurrency/
What you're looking for is not a way to have these run simultaneously (as people have noted, this depends on how many cores you have, and can only offer a 2x or maybe 4x speedup), but instead to somehow cut down on the number of collisions you have to detect.
You should look into using a quadtree. In brief, you recursively break down your 2D region into four quadrants (as needed), and then only need to detect collisions between objects in nearby components. In good cases, it can effectively reduce your collision detection time from N^2 to N * log N.
Instead of trying to do parallel-processing, you may want to look for collision detection optimization. Because in many situations, perforiming less calculations in one thread is better than distributing the calculations among multiple threads, plus it's easy to shoot yourself on the foot in this multi-threading business. Try googling "collision detection algorithm" and see where it gets you ;)
IF your computer has multiple processors or multiple cores, then you could easily run multiple threads and run smaller parts of the loops in each thread. Many PCs these days do have multiple cores -- so have it so that each thread gets 1/nth of the loop count and then create n threads.
If you really want to get into concurrent programming, you need to learn how to use threads.
Sun has a tutorial for programming Java threads here:
http://java.sun.com/docs/books/tutorial/essential/concurrency/
This sounds quite similar to an experiment of mine - check it out...
http://tinyurl.com/3fn8w8
I'm also interested in quadtrees (which is why I'm here)... hope you figured it all out.