Lab 3 Better Time Domain
Analysis
· Implement a better algorithm for finding words.
· Implement code to find F0 in the voiced regions.
Follow the plan described in Rabiner and Schafer section 4.4 to find word boundaries. Here are the steps.
· Compute the zero-crossing per 10 msec frame.
· Compute the average magnitude with a 10 msec window.
· Assume that the first 100msec of the recording contain no speech. (That is, they just contain background noise.)
· Compute the maximum of the average magnitude on this interval. This will a threshold for noise versus speech
· Compute the average and maximum zero-crossing rate on this interval. Use these to determine a zero-crossing threshold
· Find the endpoints of an interval where the average magnitude always exceeds a conservative threshold.
· Move out from those endpoints to where the average magnitude falls below a lower threshold.
· Move out from the left endpoint. at most 25 frames to the left-most place where the zero-crossing rate falls below the zero-crossing threshold. If the zero-crossing threshold was exceeded at least 3 times, accept the new endpoint. Otherwise, keep the old endpoint. Do the same heading to the right from the right endpoint.
The details are in
L. R. Rabiner and M.R. Sambur, An Algorithm for
Determining the Endpoints of Isolated Utterances, Bell System Technical Journal, Vol. 54, No. 2, February 1975.
Compute F0 using the Matlab function xcorr to do short-time
autocorrelation.
Last Updated: January 30, 2004 5:50 p.m. by