Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Need an algorithm (or a set of good ones to compare for various input data) that will reduce the noise level of a voice audio signal without distorting the signal appreciably using Java.
Input is an audio signal that includes a voice along with some background noises. The noise varies over the course of the recording. There definitely exists ways to remove noise like this, developed for voice recognition and movie making.
The desired output is a minimally distorted voice signal with the background distractions minimally audible to the human ear. The quantitative criteria are minimized
Signal to noise ratio and
Total harmonic distortion.
You are looking for adaptive noise removal and possibly a variety that adapts to changing noise conditions over time within the same stream or file.
Older approaches include:
Remove frequencies containing the majority of noise using bandpass and/or notch filters (which only works well if the desired signal and the noise band do not intersect)
Drop the noise level at points between words, notes, or other audio events (in the dead space) as does the Dolby noise reduction system scheme
Drop the noise floor across an entire file using a Hamming window or other window in conjunction with an FFT library
Hand edit sections of an audio track in programs like Cakewalk or its
competitors
These methods have been found less than desirable when trying to clean up a larger file or multiple files or in real time applications such as real time voice recognition or telephony.
One of the Java programs that I have not personally tried his here. Even though it has some level of automation, it is an LSE (least squares estimator) which works across a block of data but is not suitable for continuous operation or an audio file with changing unwanted noise conditions. (It's not as adaptive as one might hope.)
The solution that I found after much investigation and now use all the time is not written in Java. It is a MATLAB program that can also run in open source Octave with minimal modification. I started porting it to C++ but ran out of time to finish that.
The class of algorithm it implements is called MMSE (Noise Reduction Using Minimum Mean Square Estimators). The MATLAB version has been refined several times by Dr. Hendricks, concluding with the the 2010 version.
I've compared it with its competitors for both dialog and music and it equals or excels beyond the others in all cases I've tried. (I have no affiliation with Dr. Hendricks or MATLAB other than I like the results I've been getting from his implementation on that platform.)
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have been experimenting with ways to use the processing power of two computers together as one (not by physically connecting them, but by splitting the task in half and each computer does a half, then the result from the "helper" computer is sent back to be combined with the result from the "main" computer via internet)
I've been using this method to compute fractal images and it works great. The left half and the right half of the image are computed on separate computers, then combined into one. The process of sending one half of the image to the other computer and combining them takes maybe a second, so the efficiency is great and cuts time down by about half.
The problem comes when you want to do this "multi computer processing" with something that needs data exchanged very frequently.
For example, I'd like to use this for something like an n-body simulation. You need the data exchange to happen multiple times per second, so if the exchange takes about a second it actually takes much longer to try and use two computers then it would with one.
So how do online video games do it? The players around you, what they are doing, what they are wearing, everything going on has to be exchanged between everyone playing many times per second.
I'm just looking for general ideas on how to send larger amounts of data and at fast speeds.
The way I have been doing it is with PHP on a free hosting site. The helper computer will compute its half of the data then sends it to the PHP file which saves that data somewhere. Then the main computer reads this and combines it with the data it computed already.
I have a feeling PHP isn't the way to go, but I don't know much about this sort of thing.
Your first step will be to move from using HTTP Requests to using Sockets directly - this will give you much more control over the communication, and give you improved performance by reducing the overhead of the HTTP protocol (this is potentially pretty significant). Plus, with sockets you can more easily have your programs communicate to each other directly, rather than through the PHP-based software.
There are a ton of guides online as to how you would do this sort of system, and I would recommend Googling things like "game networking" and "distributed computing".
Here is one series of articles that I have found useful in the past, that covers the sort of things that you will want to read about: http://gafferongames.com/networking-for-game-programmers/
(He doesn't use Java, but the ideas are universal)
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I want to make a video transcoder as a holiday project in Java. I was looking into the basics of video files and came across terms like containers, bit rate, bit depth and so on.
I have questions regarding bitrate.
I know bitrate is the amount of data that is contained in the video per second. Sound also has a bit rate but is considerably lower than that of video, obviously.
So, say there is a video that is of 8 Mbps (YouTube HD 720p) and the user wants to transcode it to a lower bitrate of 4 Mbps.
Will this cause the sound to go out of sync?
I am not doing the transcoding myself, I am using Xuggler for it which contains a lot of codecs like H264 and others.
Also, if, by an accident the user decides to convert a 4 Mbps video to a 8 Mbps video, what will happen ?
This situation is possible if the user gives a video captured from a phone camera and decides to store it in DVD quality.
Also, there are other things to take into consideration like frame rate right ? Because a low capacity device can not handle a higher frame rate. is frame rate related to bit rate?
There are several possibilities of what will happen, depending on the decoder and so on. I'm not familiar with Xuggler, but:
The sound should not go out of sync if you drop video bitrate with a proper software. It will not shorten the video or anything like that. Depending on what you do to it, either the frame-rate will drop ( it will discard every 2nd frame) or each frame will be more compressed.
The audio and video are generally independent, so changing the bitrate of one will not affect the other. In the case of changing to a higher bitrate, the transcoder will either throw an error, or produce a larger file at the same quality as the original.
The frame rate is not directly related. The bit rate is just a measure of how many bits are being used to encode one second of audio or video.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am thinking of assembling this system:
AMD CPU(A8-3870 APU which has Radeon HD 6550D inside: 400 stream processors:xxx GFLOPS) nearly 110$
AMD Graphics card: HD 7750 (512 stream processors:819 GFLOPS peak performance) nearly 170$
an appropriate ram(1600MHz bus) and mainboard
Can i achieve 819+xxx GFLOPS peak-performance mentioned in official sites with using OpenCL and similar programs?
Can i use all 912 cores with OpenCL/Jocl and is it important to add cpu-cores to the pot(4 of them(of course 2 of them will be used for feeding gpu))?
C++ or Java, which one has the most yielding libraries for using multiple gpu's or apu's present on computer?
What happens if i cancel both apu and gpu and buy a single Nvidia GTX-660? This wins?(229$ -1800GFLOPS)(with a simple 4-core cpu of cheapest without apu)
I am not trying to do a VS question. I need to know what could be better for scientific computing(%75 of the time) and gaming(%25 of the time) because i have a low budget. With "scientific calculations" i mean fluid dynamics+solidstate physics simulating. With games i mean those have openCL and PhysX.
Can you give a very very minimal simple example of OpenCL code using multiple GPUs ?
Thank you.
Can i achieve 819+xxx GFLOPS mentioned in official sites with using OpenCL and similar programs?
This is the peak performance. One definition of peak perform is; A manufacturers guarantee not exceed this rating.
You can achieve this number most likely, but not doing something useful. What you can achieve for your specific requirement depends greatly on what it is. You might expect to get 0.1% to 10% of this value in reality.
C++ or Java, which one has the most yielding libraries for using multiple gpu's or apu's present on computer?
I would use whatever you are most comfortable with. You can call the GPU from either, but the language you use is C-like so it doesn't matter what the "host" language is.
What happens if i cancel both apu and gpu and buy a single Nvidia GTX-660?
Impossible to say, but there is a good chance whatever you choose will be okay.
Can you give a very very minimal simple example of OpenCL code using multiple GPUs ?
There are lots of example on the web, but you really need to focus on what you will be using the system for.
I’m trying to develop an application that is capable of identifying a bird sound from a wav file recording. When creating the database im using another collection of sound clips and am trying to get a unique identification to them. Im planning to do this using FFT.(I don’t have any issues with these concepts) The question is, is it important to clear the noise of these base recording before creating the unique identification? If so, will anyone be able to help me with the concept of “Zero-crossing rate” and some other technique to clear the sound file for noise and silence.Thanks in advance.
In general, there is no way to remove noise unless you already have an accurate way of indentfying a temporal or spectral difference between the noise and the signal of interest. For instance, if you know the exact frequency bandwidth of the entire signal of interest, then you can use DSP to filter out the spectrum outside if that bandwidth. If you know the minimum amplitude of your signal of interest, then you can clip out everything below that level.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm working on a polyphonic music transcription project. I have read some papers and gone through articles which explains similar tasks. I'am very confused about many aspects of the problem domain. Hope someone will be able to help me.
So far I have obtained a stream of decoded audio data from a given mp3.
I have understood onset detection is the first step towards transcription. . Is there any java library available which can be used for detecting onsets.
Next, detecting the fundamental frequency is also done with the use of FFT as I have read.
I want to know the use of FFT in these tasks. (I'm not familiar with FFT much). Is it absolutely necessary to perform FFT for onset detection and F0 detection.
If not what are the other ways.
Can I perform the FFT on the audiostream I have, or is there some other operation that has to be done in order to manipulate these audio data.
Thanks a lot.
This field is known as machine listening.
Polyphonic transcription of digitally encoded music is one of the holy grails of machine listening. It is an unsolved problem, and an area of active research. The sub-fields include:
Onset detection
Beat extraction (detection of the metric structure, time sig, etc)
Pitch detection (possible using auto-correllation, and other methods, on monophonic signals, but an unsolved problem when applied to complex polyphonic music)
Key detection (key signature detection).
Depending on the nature of your project, you might find it useful to explore the SuperCollider programming environment. SC is a language designed for projects such as this, already has a large number of machine listening plugins (ugens), and a comprehensive framework for dealing with FFT, audio signals, and much more.
This question about note onset detection contains a lot of information which may be useful to you.
This sounds a huge but very interesting project, good luck to you.
Music transcription means creating music notation from sound (or audio data). While accomplished musicians and especially composers are able to do this, it's an extremely difficult task to do with a machine, and as far as i know, there has been little success so far - mostly academic experiments.
Basically, to recognize notes, you want to know where they start, where they end, and what is their pitch. Fourier transform is the most basic way to turn time domain (audio data) to frequency domain (pitches) - in principle. In practice, musical instruments generate lots of harmonics (overtones) and if we have polyphony (many F0s) added, it's a mess.
You could try feeding something like 50 millisecond sequential slices of the audio data to the FFT. This way you would get the spectrum of each slice, then detect the strongest peaks in each slice, and infer the rhythm from what happens between successive slices.
Sorry, I couldn't help much... But just wanted to point out that what you're trying to do is extremely difficult, seriously. Perhaps you should start from something simpler, like detecting one-note sine wave melodies. Good luck!
For detecting the fundamental frequency of the melody in polyphonic music you can try out the MELODIA vamp plug-in (non-commercial use only): http://mtg.upf.edu/technologies/melodia
If you want to implement a melody extraction algorithm yourself you're going to have to check out the current state-of-the-art in research, a good place to start might be the MIREX melody extraction annual evaluation campaign: http://www.music-ir.org/mirex/wiki/Audio_Melody_Extraction
That, or just google "melody extraction" ;)