Lab 2 – Simple Time Domain Analysis

Goals:

·       Develop and implement a first algorithm for finding words.

·       Develop and implement a first pitch-tracking algorithm.

Energy and Zero-Crossing Code

Put the get_energy.m and get_zcr.m into your Matlab path.  You can download them via the CSU610 web page.  Printouts of these functions appears below.

Finding Words

Use the ‘energy' computed in the ‘get_energy’ function below to help decide how many words there are and to locate the word boundaries.

 

To save you the trouble of struggling with Matlab structures, here is the start of my program that finds words.

 

function [y, word_boundaries] = finding_words(wav_file_name)

[y, Fs] = wavread(wav_file_name);

 

%%% constants you might experiment with

frame_len = 128;

time_overlap = 20;

thresh = .2;

energy = get_energy( y, frame_len, time_overlap);  

elow = (energy < thresh*max(energy));  % length(elow) = length(energy)

num_words = sum(abs(diff(elow)))/2;    %

word_boundaries = zeros(ceil(num_words/2),2);

 

'word_boundaries' is a num_words x 2 array that will hold the start and end of each word.  Use word_boundaries(1,1) to store the start of word 1 and word_boundaries(1,2) to store the end of word 1.

 

Your program should play the signal y and then play the words of y separated by pauses (pause(1);).

 

Try your program on the three wav files you recorded last week.  Try at least one other person’s recordings too.

 

Do not expect your program to do as well as you did by hand.  For those files where you have located the word boundaries by hand, compare your results with your programs results.  Where did your program mess up?  What things in the speech samples kept it from working properly?

 

Save this code and write a new version that uses the zero-crossings to refine your algorithm.

 

Try your second program on the same wav files you tried the first code on.

 

Compare your results from the two programs with your hand-labeling from last week.  Where did your code do a better job this time?  Where was it worse?

 

Write a page or two, with pictures explaining your results.

 

------------------------------

function energy = get_energy(SIGNAL, FRAME_LEN, TIME_OVERLAP)

% get_energy computes the energy in SIGNAL

%   SIGNAL = input signal

%   FRAME_LEN = length of window over which the energy is summed

%   TIME_OVERLAP = number of samples for frame windows to overlap

%   energy is an array of length length(signal)/(frame_len - time_overlap)

% Harriet Fell 2004

 

window_len = FRAME_LEN - TIME_OVERLAP;

signal_len = length(SIGNAL);

 

% Remove the components around d.c.

signal = filtfilt([1 -1],[1 -.99], SIGNAL);

 

if (FRAME_LEN > signal_len)

    disp(['Please use a frame_len < ',int2str(signal_len)]);

else

    num_windows = floor(signal_len/window_len);

    average = zeros(1, num_windows);

 

    % Find start and end for next sample

    for z = 1:num_windows

        sample_start = (z-1)*window_len + 1;

        if (sample_start + FRAME_LEN -1 > signal_len)

            sample_end = signal_len;

        else

            sample_end = sample_start + FRAME_LEN - 1;

        end

       

        frame = signal(sample_start:sample_end);

        energy(z) = sum(frame.^2)/length(frame);

    end

end

 

return;

 

function zcr = get_zcr(SIGNAL, FRAME_LEN, TIME_OVERLAP, THRESH)

% ENERGY_ZCR computes the energy and zero-crossings in SIGNAL

%   SIGNAL = input signal

%   FRAME_LEN = length of window over which the energy is summed and zero-crossings are computed

%   TIME_OVERLAP = number of samples for frame windows to overlap

%   THRESH = threshhold percentage of signal amplitude

%   energy and zrc are arrays of length length(signal)/(frame_len - time_overlap)

% Harriet Fell 2002, using ideas from Childers' ti_e_zcr.m

 

% Normalize the signal amplitude

%signal = 14000/max(abs(signal))*signal;

 

% Remove the components around d.c.

signal = filtfilt([1 -1],[1 -.99], signal);

 

% Calculate energy and zeros

window_len = FRAME_LEN - TIME_OVERLAP;

signal_len = length(SIGNAL);

 

if (FRAME_LEN > signal_len)

    disp(['Please use a frame_len < ',int2str(signal_len)]);

else

    num_windows = floor(signal_len/window_len);

    energy = zeros(1, num_windows);

    zcr = zeros(1, num_windows);

 

    % Find start and end for next sample

    for z = 1:num_windows

        sample_start = (z-1)*window_len + 1;

        if (sample_start + FRAME_LEN -1 > signal_len)

            sample_end = signal_len;

        else

            sample_end = sample_start + FRAME_LEN - 1;

        end

       

        frame = SIGNAL(sample_start:sample_end);

        zcr(z) = zcr_cnt(frame, THRESH);

    end

end

return;

 

 %---------------------------------------------------------

 %  Local function

 %---------------------------------------------------------

function [count]= zcr_cnt(DATA, THRESH);

nmax = THRESH * max(DATA)/100;

nmin = THRESH * min(DATA)/100;

 

data_above = (DATA > nmax);     % 1 if data > nmax, 0 otherwise

data_below = (DATA < nmin);     % 1 if data < nmin, 0 otherwise

data_above = DATA.*data_above;

data_below = DATA.*data_below;

data = data_above + data_below; % values between nmin and nmax are now 0

 

data_new = (data.*circshift(data,1) < 0); % 1 if data(i) & data(i+1) have opposite sign

npoint = length(data_new);

count = sum(data_new(2:npoint-1));

 

return

 


Last Updated: January 14, 2004 5:56 p.m. by

Harriet Fell
College of Computer Science, Northeastern University
360 Huntington Avenue #161CN,
Boston, MA 02115
Internet: fell@ccs.neu.edu
Phone: (617) 373-2198 / Fax: (617) 373-5121
The URL for this document is: http://www.ccs.neu.edu/home/fell/CSU610/SpeechSP2004Lab2.html