Predicting the birth of a spoken word

Brandon C Roy, Michael C Frank, Philip DeCamp, Matthew Miller, Deb Roy, Brandon C Roy, Michael C Frank, Philip DeCamp, Matthew Miller, Deb Roy

Abstract

Children learn words through an accumulation of interactions grounded in context. Although many factors in the learning environment have been shown to contribute to word learning in individual studies, no empirical synthesis connects across factors. We introduce a new ultradense corpus of audio and video recordings of a single child's life that allows us to measure the child's experience of each word in his vocabulary. This corpus provides the first direct comparison, to our knowledge, between different predictors of the child's production of individual words. We develop a series of new measures of the distinctiveness of the spatial, temporal, and linguistic contexts in which a word appears, and show that these measures are stronger predictors of learning than frequency of use and that, unlike frequency, they play a consistent role across different syntactic categories. Our findings provide a concrete instantiation of classic ideas about the role of coherent activities in word learning and demonstrate the value of multimodal data in understanding children's language acquisition.

Keywords: diary study; language acquisition; multimodal corpus analysis; word learning.

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. S1.
Fig. S1.
Site of the Human Speechome Project, where all recording took place. Also shown is the ceiling-mounted camera with an open privacy shutter, the microphone, the recording controller, and a view into the living room.
Fig. S2.
Fig. S2.
Schematic of data collection and processing for our dataset, leading to our outcome (blue) and predictor (red) variables. (A) Audio recordings are filtered automatically for speech and speaker identity and then transcribed. Transcripts are used for the identification of the child’s productions, extraction of frequency, MLU, and temporal distinctiveness predictors, as well as for clustering via topic models (LDA) to extract the linguistic distinctiveness measure. (B) Video recordings are processed via motion-based clustering. Region-of-motion distributions for each word are then compared with a base motion distribution for all linguistic events, yielding the spatial distinctiveness predictor.
Fig. S3.
Fig. S3.
Speaker identification performance curves. Classifier yield is the fraction of the speaker classifications above a confidence threshold (A), and accuracy is the fraction of above-threshold classifications correct for each speaker (B). (C) Receiver operating characteristic (ROC) curve displays the relationship between true-positive (TPr) and false-positive (FPr) rates for each speaker as the confidence threshold is varied.
Fig. S4.
Fig. S4.
Counts for the word “star” by month. Child-labeled counts are shown in red, whereas total counts across all speakers are shown in gray.
Fig. S5.
Fig. S5.
Screen shot of the Word Birth Browser tool showing the main window (Left) and context window (Right). In the main window, the left pane is used to select a word to review and the right pane presents all utterances containing the target word, which can be sorted by different attributes. The context window presents the utterances that surround the selected utterance within a temporal window of 1–2 min.
Fig. S6.
Fig. S6.
Child’s word birth count (A) and MLU (B) by month (95% confidence interval shaded). The child’s total vocabulary is increasing across the full 9- to 24-mo age range, but the growth rate exhibits an increase up to 18 mo of age, followed by a decline. However, MLU remains relatively flat (at ∼1) until 18 mo. Num, number.
Fig. S7.
Fig. S7.
Overall breakdown of spoken language over time for each speaker. The proportion of word tokens produced (Top) and the proportion of transcripts produced (Bottom) are shown.
Fig. 1.
Fig. 1.
Regression coefficients (±SE) for each predictor in a linear model predicting AoFP. Each grouping of bars indicates a separate model: a baseline model with only the number of phonemes, MLU, and frequency or a model that includes one of the three distinctiveness predictors. Red/orange/purple bars indicate distinctiveness predictors (spatial/temporal/linguistic). Coefficients represent number of days earlier/later that the child will first produce a word per SD difference on a predictor. (Right) Three plots show these models for subsets of the vocabulary.
Fig. 2.
Fig. 2.
Predicted AoFP plotted by true AoFP for successive regression models. Each dot represents a single word, with selected words labeled and lines showing the change in prediction due to the additional predictor for those words. Color denotes word category, the dotted line shows the regression trend, and the dashed line shows perfect prediction. (Left) Plot shows the baseline model, which includes frequency, phonemes, and utterance length. (Right) Subsequent three plots show change due to each distinctiveness predictor when added to the baseline model. An interactive version of this analysis is available at wordbirths.stanford.edu/.
Fig. 3.
Fig. 3.
Examples of eight spatial, temporal, and linguistic context distributions for words. Spatial distributions show the regions of the house where the word was more (red) and less (blue) likely than baseline to be used. Rooms are labeled in the topmost plot. Temporal distributions show the use of the target word throughout the day, grouped into 1-h bins (orange) and compared with baseline (gray). Linguistic distributions show the distribution of the word across topics (purple), compared with the baseline distribution (gray). The top five words from the three topics in which the target word was most active are shown above the topic distribution.
Fig. S8.
Fig. S8.
Pearson (A) and Spearman (B) correlation (corr.) coefficients between all pairs of predictors. Frequency and number of phonemes are most strongly correlated, an indication that longer words tend to be used less frequently [first noted by Zipf (43)]. The red box shows correlations for distinctiveness predictors. freq, frequency; utt, utterance. ***P≤0.001; **P≤0.01; *P≤0.05.

Source: PubMed

3
Abonnieren