Lab 3 – Better Time Domain Analysis

Goals:

·       Implement a better algorithm for finding words.

·       Implement code to find F0 in the voiced regions.

Finding Words

Follow the plan described in Rabiner and Schafer section 4.4 to find word boundaries.  Here are the steps.

·       Compute the zero-crossing per 10 msec frame.

·       Compute the average magnitude with a 10 msec window.

·       Assume that the first 100msec of the recording contain no speech. (That is, they just contain background noise.)

·       Compute the maximum of the average magnitude on this interval.  This will a threshold for noise versus speech

·       Compute the average and maximum zero-crossing rate on this interval.  Use these to determine a zero-crossing threshold

·       Find the endpoints of an interval where the average magnitude always exceeds a conservative threshold.

·       Move out from those endpoints to where the average magnitude falls below a lower threshold.

·       Move out from the left endpoint. at most 25 frames to the left-most place where the zero-crossing rate falls below the zero-crossing threshold.  If the zero-crossing threshold was exceeded at least 3 times, accept the new endpoint.  Otherwise, keep the old endpoint.  Do the same heading to the right from the right endpoint.

The details are in

L. R. Rabiner and M.R. Sambur, “An Algorithm for Determining the Endpoints of Isolated Utterances,” Bell System Technical Journal, Vol. 54, No. 2, February 1975.

Finding F0

Compute F0 using the Matlab function xcorr to do short-time autocorrelation.


Last Updated: January 30, 2004 5:50 p.m. by

Harriet Fell
College of Computer Science, Northeastern University
360 Huntington Avenue #161CN,
Boston, MA 02115
Internet: fell@ccs.neu.edu
Phone: (617) 373-2198 / Fax: (617) 373-5121
The URL for this document is: http://www.ccs.neu.edu/home/fell/CSU610/SpeechSP2004Lab3.html