I want to write a program in Java that uses fast Fourier transformation.
The program reads data every 5 milliseconds seconds from sensors and is supposed to do something with the data every 200 milliseconds based on the data from the last five seconds.
Is there a good library in Java that provides a way to do Fourier transformation without recalculating all five seconds every time?
Hard real time problems are not the proper application of Java. There are too many variables such as Garbage collection and Threads not guaranteed to happen within a given interval to make this possible. If close enough is acceptable it will work. The performance of your software as far as timing will also depend on the OS and hardware you are using and what other programs are also running on that box.
There is a Real Time Java, that does have a special API for the issues I mention above. You do not indicate that you are using that. It is also a different animal in a lot of respects than plain Java.
Related
The context
I needed to write a quick tool to directly extract ASCII data files from a collection of *.zip files into Matlab memory. The files are large, so I wanted to leave the storage alone.
Using the Matlab Java integration, this was relatively straightforward. H/T to this excellent answer, option 7 is easy to implement in Matlab using Java objects.
The strange timing result
The oddness comes from when I was trying to optimize for speed. This leads to the following timing script.
zipPath = "\\full\path\to\zipfile.zip";
zipJavaFile = java.io.File(zipPath);
tic
zipFile_apache = org.apache.tools.zip.ZipFile(zipJavaFile);
toc %Typical time value: 2 sec. Typical range: 2 -- 8 seconds.
tic
zipFile_util = java.util.zip.ZipFile(zipJavaFile);
toc %Typical time value: 0.007 sec. Typical range: 0.002 -- 0.015 seconds.
The timing differences between the java.util.zip and org.apache.tools.zip libraries is very large, milliseconds vs. seconds.
For the current task, changing one line of code caused a full data read time, spread among multiple files, to improve from 200s --> 5s.
The general trend is persistent if I rerun the timing script, trying to prime any OS caching. It is also persistent if I change the order of the test cases.
Both toolsets lead to correct data extraction, with no noticeable difference in speed after the lines shown.
The question
I would have expected similar timing performance between two Java Zip toolsets. I cannot find any other reference to this large of a time difference.
I don't understand where the time difference comes from. In a slightly different task (e.g. without two parallel libraries) I may have been stuck with the slower performance due to this ignorance.
Why is there such a speed difference between these libraries, in this context?
I have a doubt: I understood that it takes advantage of hardware parallelism and that it controls I/O at the hardware level in order to provide faster response times but which are the software benefits of an FPGA? Which software components can be accelerated? Thanks in advance.
Both for prototyping and parallelism. Since FPGAs are cheap they are good candidates for using both when making industrial protypes and paralell systems. FPGAs consist of arrays of logic elements connected with wires. The elements contain small lookup tables and flip-flops. FPGAs scale to thousands of lookup tables. Lookup tables and programmable wires are flexible enough to implement any logic function. Once you have the function ready then you might want to use ASIC. Xilinx and Altera are the major brands. Personally I use the Altera DE2 and DE2-115.
You are correct about the parallelism and I/O control of an FPGA. FPGA's are just huge re-configurable logic circuits that allow a developer to create circuits with very specific and dedicated functions. They typically come with a very large amount of I/O when compared to typical micro-controllers as well. Because it is basically a bunch of gates in silicon, everything you code in your hardware description language (HDL) will happen in parallel. This combined with the fact that you can write custom circuits is what gives the ability to accelerate operations in an FPGA over a typical processor even though the processor may have a much higher clock speed. To better illustrate this point, lets say you have an algorithm that you need to run and you want to compare an FPGA to a processor. The FPGA is clocked at 100Mhz and the processor at 3Ghz. This means the processor is running at a rate that is 30 times faster than the FPGA. Let's say you code up a circuit that is capable of computing the algorithm on the FPGA in 10 clock cycles. The equivalent algorithm on a processor could take thousands of instructions to execute. This places the FPGA far ahead of the processor in terms of performance. And, because of the parallel nature of FPGA's, if you code it right and the flow through the FPGA is continuous, every clock cycle, the FPGA can finish the computation of a new result. This is because every stage in an FPGA will execute concurrently. So it may take 10 clock cycles, but at each clock cycle, a different piece of 10 different results can be computed simultaneously (this is called pipelinine: http://en.wikipedia.org/wiki/Pipeline_%28computing%29 ). The processor is not capable of this and can not compute the different stages in parallel without taking extra time to do so. Processors are also bound on a performance level by their instruction set whereas on an FPGA, this can be overcome by good and application specific circuit design. A processor is a very general design that can run any combination of instructions, so it takes longer to compute things because of its generalized nature. FPGAs also don't have the issues of moving things in and out of cache or RAM. They typically do use RAM, but in a parallel nature that does not inhibit or bottleneck the flow of the processing. It is also interesting to note that a processor can be created and implemented on an FPGA because you can implement the circuits that compose a processor.
Typically, you find FPGAs on board with processors or microcontrollers to speed up very math intensive or Digital Signal Processing (DSP) tasks, or tasks that require a large flow of data. For example, a wireless modem that communicates via RF will have to do quite a bit of DSP to pick signals out of the air and decode them. Something like the receiver chip in a cell phone. There will be lots of data continually flowing in and out of the device. This is perfect for an FPGA because it can process such a large amount of data and in parallel. The FPGA can then pass the decoded data off to a microcontroller to allow it to do things like display the text from a text message on a pretty touchscreen display. Note that the chip in your cell phone is not an FPGA, but an ASIC (application specific integrated circuit). This is a bunch of circuits stripped down to the bare minimum for maximum performance and minimal cost. However, it is easy to prototype ASICs on FPGAs and they are very similar. An FPGA can be wasteful because it can have a lot of resources on board that are not needed. Generally you only move from an FPGA to an ASIC if you are going to be producing a TON of them and you know they work perfectly and will only ever have to do exactly what they currently do. Cell phone transceiver chips are perfect for this. They sell millions and millions of them and they only have to do that one thing their entire life.
On something like a desktop processor, it is common to see the term "hardware accleration". This generally means that an FPGA or ASIC is on board to speed up certain operations. A lot of times this means that an ASIC (probably on the die of the processor) is included to do anything such as floating point math, encryption, hashing, signal processing, string processing, etc. This allows the processor to more efficiently process data by offloading certain operations that are known to be difficult and time consuming for a processor. The circuit on the die, ASIC, or FPGA can do the computation in parallel as the processor does something else, then gets the answer back. So the speed up can be very large because the processor is not bogged down with the computation, and is freed up to continue processing other things while the other circuit performs the operation.
Some think of FPGAs as an alternative to processors. This is usually fundamentally wrong.
FPGAs are customizable logic. You can implement a processor in a FPGA and then run your regular software on that processor.
The key advantage with FPGAs is the flexibility to implement any kind of logic.
Think of a processor system. It might have a serial port, USB, Ethernet. What if you need another more specialized interface which your processor system does not support? You would need to change your hardware. Possibly create a new ASIC.
With a FPGA you can implement a new interface without the need for new hardware or ASICs.
FPGAs are almost never used to replace a processor. The FPGA is used for particular tasks, such as implementing a communications interface, speed up a specific operation, switch high bandwidth communication traffic, or things like that. You still run your software on a CPU.
I am trying to compare the accuracy of timing methods with C++ and Java.
With C++ I usually use CLOCKS_PER_SEC, I run the block of code I want to time for a certain amount of time and then calculate how long it took, based on how many times the block was executed.
With Java I usually use System.nanoTime().
Which one is more accurate, the one I use for C++ or the one I use for Java? Is there any other way to time in C++ so I don't have to repeat the piece of code to get a proper measurement? Basically, is there a System.nanoTime() method for C++?
I am aware that both use system calls which cause considerable latencies. How does this distort the real value of the timing? Is there any way to prevent this?
Every method has errors. Before you spend a great deal of time on this question, you have to ask yourself "how accurate do I need my answer to be"? Usually the solution is to run a loop / piece of code a number of times, and keep track of the mean / standard deviation of the measurement. This is a good way to get a handle on the repeatability of your measurement. After that, assume that latency is "comparable" between the "start time" and "stop time" calls (regardless of what function you used), and you have a framework to understand the issues.
Bottom line: clock() function typically gives microsecond accuracy.
See https://stackoverflow.com/a/20497193/1967396 for an example of how to go about this in C (in that instance, using a usec precision clock). There's the ability to use ns timing - see for example the answer to clock_gettime() still not monotonic - alternatives? which uses clock_gettime(CLOCK_MONOTONIC_RAW, &tSpec);
Note that you have to extract seconds and nanoseconds separately from that structure.
Be careful using System.nanoTime() as it is still limited by the resolution that the machine you are running on can give you.
Also there are complications timing Java as the first few times through a function will be a lot slower until they get optimized for your system.
Virtually all modern systems use pre-emptive multi threading and multiple cores, etc - so all timings will vary from run to run. (For example if control gets switched away from your thread while it in the method).
To get reliable timings you need to
Warm up the system by running around the thing you are timing a few hundred times before starting.
Run the code for a good number of times and average the results.
The reliability issues are the same for any language so apply just as well to C as to Java so C may not need the warm-up loop but you will still need to take a lot of samples and average them.
I'm currently prototyping a multimedia editing application in Java (pretty much like Sony Vegas or Adobe After Effects) geared towards a slightly different end.
Now, before reinventing the wheel, I'd like to ask if there's any library out there geared towards time simulation/manipulation.
What I mean specifically, , an ideal solution would be a library that can:
Schedule and generate events based on an elastic time factor. For example, real time would have a factor of 1.0, and slow motion would be any lower value; a higher value for time speedups.
Provide configurable granularity. In other words, a way to specify how frequently will time based events fire (30 frames per second, 60 fps, etc.)
An event execution mechanism of course. A way to define that an events starts and terminates at a certain point in time and get notified accordingly.
Is there any Java framework out there that can do this?
Thank you for your time and help!
Well, it seems that no such thing exists for Java. However, I found out that this is a specific case of a more general problem.
http://gafferongames.com/game-physics/fix-your-timestep/
Using fixed time stepping my application can have frame skip for free (i.e. when doing live preview rendering) and render with no time constraints when in off-line mode, pretty much what Vegas and other multimedia programs do.
Also, by using a delta factor between each frame, the whole simulation can be sped up or slowed down at will. So yeah, fixed time stepping pretty much nails it for me.
I'm gathering some data about the difference in performance between a JVM method call and a remote method call using a binary protocol (in other words, not SOAP). I am developing a framework in which a method call may be local or remote at the discretion of the framework, and I'm wondering at what point it's "worth it" to evaluate the method remotely, either on a much faster server or on a compute grid of some kind. I know that a remote call is going to be much, much slower, so I'm mostly interested in understanding the order-of-magnitude differnce. Is it 10 times slower, or 100, or 1,000? Does anyone have any data on this? I'll write my own benchmarks if necessary, but I'm hoping to re-use some existing knowledge. Thanks!
Having developed a low latency RMI (~20 micro-seconds min) it is still 1000x slower than a direct call. If you use plain Java RMI, (~500 micro-seconds min) it can be 25,000x slower.
NOTE: This is only a very rough estimate to give you a general idea of the difference you might see. There are many complex factors which could change these numbers dramatically. Depending on what the method does, the difference could be much lower, esp if you perform RMI to the same process, if the network is relatively slow the difference could be much larger.
Additionally, even when there is a very large relative difference, it may be that it won't make much difference across your whole application.
To elaborate on my last comment...
Lets say you have a GUI which has to poll some data every second and it uses a background thread to do this. Lets say that using RMI takes 50 ms and the alternative is making a direct method call to a local copy of a distributed cache takes 0.0005 ms. That would appear to be an enormous difference, 100,000x. However, the RMI call could start 50 ms earlier, still poll every second, the difference to the user is next to nothing.
What could be much more important is when RMI compared with using another approach is much simpler (if its the right tool for the job)
An alternative to use RMI is using JMS. Which is best depends on your situation.
It's impossible to answer your question precisely. The ratio of execution time will depends on factors like:
The size / complexity of the parameters and return values that need to be serialized for the remote call.
The execution time of the method itself
The bandwidth / latency of the network connection
But in general, direct JVM method calls are very fast, any kind of of serialization coupled with network delay caused by RMI is going to add a significant overhead. Have a look at these numbers to give you a rough estimate of the overhead:
http://surana.wordpress.com/2009/01/01/numbers-everyone-should-know/
Apart from that, you'll need to benchmark.
One piece of advice - make sure you use a really good binary serialization library (avro, protocol buffers, kryo etc.) couple with a decent communications framework (e.g. Netty). These tools are far better than the standard Java serialisation/io facilities, and probably better than anything you can code yourself in a reasonable amount of time.
No one can tell you the answer, because the decision of whether or not to distribute is not about speed. If it was, you would never make a distributed call, because it will always be slower than the same call made in-memory.
You distribute components so multiple clients can share them. If the sharing is what's important, it outweighs the speed hit.
Your break even point has to do with how valuable it is to share functionality, not method call speed.