synchronizing lyrics and song automatically - Java - java

I'm trying to create an Android app which will get the lyrics of an mp3 from the ID3V2 tag of it. My question is, is it possible to get the lyrics automatically highlighted as the song plays? Like using speech processing or things like that. I've looked into the previous similar questions but all of them requires manual input. Need an ASAP feedback. Thank you.

This kind of thing is possible on Hollywood movie sets, using technology similar to those image enhancements that reconstruct a face using a 4-pixel square as input.
Okay, so your request is theoretically more feasible, but no current phone technology I know of could do this on the fly. You might need a Delorean, flux capacitor and some plutonium.
Also, detecting vocals over music is a much harder problem than speaking a text message into your phone:
Sung lyrics do not usually follow natural speech rhythm;
The frequency spectrum of music tends to conflict with the frequency spectrum of voice;
The voice varies in pitch, making it much harder to isolate and detect phonetic features;
The vocals are often mixed at a level equal to all other musical instruments;
IwannahuhIwannahuhIwannahuhIwannahuhIwannaReallireallirealliwannaZigaZiggUHH.

You might take a look at this paper LyricSynchronizer: Automatic Synchronization System Between Musical Audio Signals and Lyrics for a possible solution. Nothing implemented in Java for Android, but with the NDK you might take any C code and finagle it to work. ;-)
This paper describes a system that can automatically synchronize polyphonic musical audio signals with their corresponding lyrics. Although methods for synchronizing monophonic speech signals and corresponding text transcriptions by using Viterbi alignment techniques have been proposed, these methods cannot be applied to vocals in CD recordings because vocals are often overlapped by accompaniment sounds. In addition to a conventional method for reducing the influence of the accompaniment sounds, we therefore developed four methods to overcome this problem: a method for detecting vocal sections, a method for constructing robust phoneme networks, a method for detecting fricative sounds, and a method for adapting a speech-recognizer phone model to segregated vocal signals. We then report experimental results for each of these methods and also describe our music playback interface that utilizes our system for synchronizing music and lyrics.
Best of luck in your implementation!

Related

Identify sound clip in a wav file using Java

I am working on a personal project. Basically I have a collection of small sound clips, like a clap or a beep noise. I want to create a program that listens for the sounds via a mic or some form of audio input, and when I play sound clip it should identify that clip.
I have tried looking into this myself and have found this article.
http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/
I tried replicating it, but I have found that it doesn't work as expected. I am guessing the sound clips I am using to create my hash from are too small to create enough values to compare.
Wondering if there any well know programs or algorithms that are capable of doing this.
Dan Ellis' slides are probably a good start. They explain the principal task of audio fingerprinting and the two best known approaches:
The Shazam algorithm by A. Wang (paper)
The Philips (now Gracenote) algorithm by Haitsma/Kalker (paper)
As you have already tried the landmark (Shazam) approach, perhaps it's worth your time to fiddle around with the stream-based approach. Since your queries are very short, you might also want to tweak the analysis frame length and overlap. Shorter frames and greater overlap may improve your results for very short samples. If you want to delve even deeper into the Haitsma/Kalker algorithm, you might also be interested in this unfortunately paywalled paper (by me).

Procedural audio for rolling dice

I'm looking for suggestions on how to approach randomizing the audio around rolling a pair of D6 dice in my game. I'd like each roll of the dice to sound different but be reasonably plausible. And I don't want to make a bunch of pre-recorded dice rolls, I'd like more variation. Note that I do not need to synchronize the audio with animations (the animation I'm using is very simplistic and abstract).
My assumption is that I need a couple basic audio snippets for the sound of a single die hitting a surface once. (Any suggestions for generating those? Or is it best to capture them?) Then I would need some way to mutate and combine variations on that basic sound to create a unique roll sound.... Or am I just too ignorant to understand how complex that would be and I should try a different approach?
I'm developing in Java for Android, but tutorials or descriptions of how to reasonably combine or procedurally generate audio in any language would be appreciated. I don't need real-time support as I think I could just generate the next roll's audio in advance and cache it until the dice are actually rolled.
The dice could generate 2 types of sounds - one when hitting the environment; and one while hitting the other dice. Reading a spectrograph of a dice collision sound could give you the ratios of overtones. Its easy to prototype this in a software like Pure Data. The idea is that by varying the fundamental frequency slightly you should have a more procedural collision sound. You can probably use random numbers to do the variation and predict collisions. Its probably not the perfect sound, but it could be a start.
This is probably very vague, but I hope it still helps :D.
My solution would be to use free recording software to capture the sounds of one die rolling at a time. The right software would be able to chop those files into small samples containing each percussive hit and the residual audio. Just a few recorded dice rolls would give you dozens of samples of "hits".
Then, using Java, load up references to each sample, and design an algorithm that would play them back in a semi-random sequence with appropriate timing modifications. Then you'd have plenty of variety on the fly without the need to mix audio into a single stream before playback.
You can look over the procedural code for the "Shaker" class from the synthesis toolkit (STK). STK is a C++ library, but the procedure for actually creating the audio samples isn't too hard to pull out. There are lots of type of shakers offered by default. I expect you could poke in at the parametrization and make modifications if you wanted.

noise and silence clearing in java

I’m trying to develop an application that is capable of identifying a bird sound from a wav file recording. When creating the database im using another collection of sound clips and am trying to get a unique identification to them. Im planning to do this using FFT.(I don’t have any issues with these concepts) The question is, is it important to clear the noise of these base recording before creating the unique identification? If so, will anyone be able to help me with the concept of “Zero-crossing rate” and some other technique to clear the sound file for noise and silence.Thanks in advance.
In general, there is no way to remove noise unless you already have an accurate way of indentfying a temporal or spectral difference between the noise and the signal of interest. For instance, if you know the exact frequency bandwidth of the entire signal of interest, then you can use DSP to filter out the spectrum outside if that bandwidth. If you know the minimum amplitude of your signal of interest, then you can clip out everything below that level.

Audio programming, Sound Processing and DSP

I was playing with a karaoke application on iPhone and came up with following questions:
The application allowed its users to control the volume of the artist; even mute it. How is this possible?
Does adjusting artist sound/setting equalizer etc. mean performing some transformation of required frequencies? What sort of mathematics is required here(frequency domain transformations)?
The application recorded users voice input via a mic. Assuming that the sound is recorded in some format, the application was able to mix the recording with the karaoke track(with artists voice muted). How can this be done?
Did they play both the track and voice recording simultaneously? Or maybe they inserted additional frequency(channel?) in the original track, maybe replaced it?
What sort of DSP is involved here? Is this possible in Java, Objective C?
I am curious and if you have links to documents or books that can help me understand the mechanism here, please share.
Thanks.
I don't know that particular application, probably it has a voice track recorder separately.
For generic 2-channels stereo sound the easiest voice suppression can be performed assuming that artist's voice is somehow equally balanced between two channels (acoustically it appears in center). So the simplest 'DSP' would be subtract one channel from another. It does not work that well however with modern records since all instruments and voice are recorded separately and then mixed together (meaning that voice will not be necessarily in phase between two channels).
I have written two detailed blogposts on how to get a custom EQ in iOS. But i have no details about how to do the DSP yourself. If you simply want to choose between a wide range of effects and stuff, try this.
First post explains how you build libsox:
http://uberblo.gs/2011/04/iosiphoneos-equalizer-with-libsox-making-it-a-framework
The second explains how to use it:
http://uberblo.gs/2011/04/iosiphoneos-equalizer-with-libsox-doing-effects
please up the answer if it helped you! thanks!

Sound Processing - Beat Matching Music Player on Android

So I want to make a new music player for Android, it's going to be open source and if you think this idea is any good feel free to let me know and maybe we can work on it.
I know it's possible to speed up and slow down a song and normalize the sound so that the voices and instruments still hit the same pitch.
I'd like to make a media play for Android aimed at joggers which will;
Beat match successive songs
Maintain a constant beat for running to
Beat can be established via accelerometer or manually
Alarms and notifications automatically at points in the run (Geo located or timer)
Now I know that this will fall down with many use cases (Slow songs sounding stupid, beat changes within song getting messed up) but I feel they can be overcome. What I really need to know is how to get started writing an application in C++ (Using the Android NDK) which will perform the analysis and adjust the stream.
Will it be feasible to do this on the fly? What approach would you use? A server that streams to the phone? Maybe offline analysis of the songs on a desktop that gets synched to your device via tether?
If this is too many questions for one post I am most interested in the easiest way of analysing the wave of an MP3 to find the beat. On top of that, how to perform the manipulation, to change the beat, would be my next point of interest.
I had a tiny crappy mp3 player that could do double speed on the fly so I'm sure it can be done!
Gav
This is technologically feasible on a smartphone-type device, although it is extremely difficult to achieve good-sounding pitch-shifting and time-stretching effects even on a powerful PC and not in realtime.
Pitch-shifting and time-stretching can be achieved on a relatively powerful mobile device in realtime (I've done it in .Net CF on a Samsung i760 smartphone) without overly taxing the processor (the simple version is not much more expensive than ordinary MP3 playback). The effect is not great, although it doesn't sound too bad if the pitch and time changes are relatively small.
Automatic determination of a song's tempo might be too time-consuming to do in real time, but this part of the process could be performed in advance of playback, or it could be done on the next song well before the current song is finished playing. I've never done this myself, so I dunno.
Everything else you mentioned is relatively easy to do. However: I don't know how easy Android's API is regarding audio output, or even whether it allows the low-level access to audio playback that this project would require.
Actually, you'll have 2 problems:
Finding the tempo of a song is not easy. The most common method involves autocorrolation, which involves quite a bit of calculus, so I hope you've studied up.
Actually changing the beat of a song without pitch shift is even harder, and still results in sound artifacts in the song. Typically it takes a long time to edit audio in this way, and it takes a lot of tinkering to get the song to sound good. To actually perform this in real time would be very, very hard. The actual process involves taking the Fourier Transform of the audio, shifting the frequency, and taking the inverse Fourier Transform. More calculus, this time with imaginary numbers.
If you really want to work on this I suggest taking a class in signals and systems from an Electrical Engineering department.
Perhaps an easier idea: Find the tempo of all the songs in a user's library, and just focus on playing songs with a close beat to the jogger's pace. You still need to do #1 but you don't need to worry about #2.
Changing the audio speed on the fly is definetly doable; I'm not sure if it's doable on the G1.
Rather than writing your own source I would recommend looking at the MythTV source and/or the mplayer source code. They both support speeding up video playback while compensating the audio.
http://picard.exceed.hu/tcpmp/test/
tcpmp did all that you asked for on an iddy biddy Palm Centro... And More, Including Video! If it can be done on a Palm Centro, it sure as heck can be done on the Android!!

Categories