Analysis of Infant Babbles

by theEarly
Harriet J. Fell, Ph.D College of Computer Science, Northeastern University
Linda J. Ferrier, Ph.D.. Speech-Language Pathology and Audiology, Northeastern
Dr. Carol Espy-Wilson, Ph.D.. Electrical and Computer Engineering, Boston University
Susan G. Worst, M.A. College of Computer Science. Northeastern University
Eric A. Craft M.S. Electrical and Computer Engineering, Boston University
Karen Chenausky, M.S Speech Technology and Applied Research, Lexington, Massachusetts
Joel MacAuslan, Ph.D Speech Technology and Applied Research, Lexington, Massachusetts
Glenna Hennessey M.S. Speech-Language Pathology and Audiology, Northeastern
Presented at theAmerican Speech-Language-Hearing Convention November 17, 2000.
This work was sponsored in part by NIH Grant #R42-HD34686.


The Early Vocalization Analyzer (EVA), is a computer program that automatically analyzes digitized recordings of infant vocalizations. EVA is can clinically distinguish typically developing from non-typically developing infants strictly by acoustic analysis of an infant's syllable structure.


Considerable research supports the position that infant vocalizations effectively predict later articulation and language. Intervention to encourage babbling activity in at-risk infants is frequently recommended. However, research and clinical diagnosis of delayed or reduced babbling have so far depended on time-consuming and often unreliable perceptual analyses of tape-recorded infant sounds. While acoustically analyzing infant sounds has provided important information on the early characteristics of infant vocalizations, this information has not yet been used in automatic analysis. We are developing a program, EVA, which automatically analyzes digitized recordings of infant vocalizations.

Here, we report on our progress in extending EVA in two ways:

  1. using landmark and timing information to give an infant's vocalization age, a clinical developmental indicator.
  2. distinguishing closants (consonant-like phonemes produced by oral cavity constrictions) by manner of articulation (e.g., stop, fricative, nasal, or liquid) and place of articulation (e.g., labial, velar, palatal).


Nine typically developing subjects were recorded. Four were male and five female. We are now following five at-risk infants, four are male and one female. Of these infants, one has apraxia, one has Down Syndrome, three, one of whom was premature, show motor delay. Of the fourteen infants, two are African-American and one is Hispanic. All but the Hispanic infant have American-English-speaking parents. Each infant was recorded eight times for 40 minutes, at approximately monthly intervals, from six to thirteen months.

The EVA Software

Landmark Detector

Built on the Liu-Stevens Landmark Detection program for adult-speech founded on Stevens' acoustic model of speech production. Central to this theory are landmarks, points in an utterance around which listeners extract information about the underlying distinctive features. They mark perceptual foci and articulatory targets.

The program detects three types of landmarks:

  1. g(lottis): marks the time when the vocal folds transition from freely vibrating to not freely vibrating or vice versa
  2. s(onorant): marks sonorant consonantal closures and releases (e.g., nasals)
  3. b(urst): designates stop/affricate bursts and points where aspiration/frication ends due to stop closure


Uses landmark types and times output by the Landmark Detection program to:
  1. Group landmarks into "standard" syllable types such as "+g/-s/-g", using information about their order and spacing. For each subject at each month, creates a profile using the number of syllables, number of distinct syllables, and length of each syllable type.
  2. Group syllables into "utterances"--series of syllables occurring closely together--based on timing considerations. Describes the average number of syllables per utterance, as well as the number of utterances comprised of 1, 2, 3, and more syllables, for each subject and subject-month.
  3. Remove landmarks from areas of the recording that have been corrupted by noise, as well as landmarks produced as artifacts of the process.

The Phonetic Classifier

A version of Carol Espy-Wilson's classifier EBS (Event Based System, 1995) has been adapted by Eric Craft, Boston University for use with infant vocalizations. This program labels the following features (see figure 1):
SY Syllabic
SC Sonorant Consonant
A Affricative
LS Labial Stop
AS Alveolar stop or Aspiration?
VS Velar stop
AF Alveolar fricative
PF Palatal fricative
WF Weak fricative
SL Silence
V Vowel
Fr Fricative
ST Stop
CL Closure

Finding the Vocalizaton Age

  1. We started with 95 observable acoustic features taken across 63 typically developing infants' sessions. The data were normalized to produce statistics with zero mean and approximately unit standard deviation (noise level).
  2. We performed a principal components (PC) analysis on all 63 sessions' 95 normalized features and discarded PCs with weights of order the noise level. The remaining 57 PCs produce an estimate of the detectable structure in the data.
  3. We performed an optimal linear fit to chronological age using these 57 PCs. The 38 coefficients (95 minus 57) of smallest magnitude were set to exactly zero. The result is a linear fit with only 57 coefficients and residuals of only 1.46 months, just slightly more than the 1.24 months of the (unrealistic) optimal fit retaining all 95 coefficients.

Coefficients in the EVA Vocalization Age

Syllables per Utterance
1 0.6048
2 -1.4200
3 -0.4494
Initial Landmark
+b -0.7519
+g 0.3291
+s 1.0386
Final Landmark
-b 0
-g 0.5945
-s -1.3130
+s 0.2968
Landmark Sequence Length
2 0
3 -0.3000
4 1.5889
5 0.8499
6 0
7 0
SyllableDistributionAverage Duration

Diagnostic Value of the Vocalization Age

The following graphs show the Vocalization Age versus the Actual Age of the infants in our study. The first graph shows the results for typically developing children only, the second graph for non-typically developing children (a key follows), and the third graph shows both groups of children combined with magenta + marks for the typically developing children and cyan o marks for the non-typically developing children. The lines are at Vocalization Age = Actual Age and at Vocalization Age = Actual Age +/1 one or two standard deviations (1.46 months)

The Non-typical Subjects

  • blue dot - Severe Apraxia
  • magenta triangle - Motor Delay
  • cyan circle - Born at 32 weeks weighing 3 lbs. 9 oz. Downs Syndrome, multiple difficulties.
  • red star - Motor Delay, multiple difficulties including a heart murmur, fused kidneys, and a tracheoesophageal fistula.
  • Black X - Born at 28 weeks weighing 1 lb. 15oz, was tracheostomized.

Harriet Fell
College of Computer Science, Northeastern University
360 Huntington Avenue #161CN,
Boston, MA 02115
Phone: (617) 373-2198 / Fax: (617) 373-5121

Last Updated: November 11, 2000 7:20 p.m.
The URL for this document is: