Selecting an acoustic correlate for automated measurement of American English rhotic production in children

Heather Campbell, Daphna Harel, Elaine Hitchcock, Tara McAllister Byun, Heather Campbell, Daphna Harel, Elaine Hitchcock, Tara McAllister Byun

Abstract

Purpose: A current need in the field of speech-language pathology is the development of reliable and efficient techniques to evaluate accuracy of speech targets over the course of treatment. As acoustic measurement techniques improve, it should become possible to use automated scoring in lieu of ratings from a trained clinician in some contexts. This study asks which acoustic measures correspond most closely with expert ratings of children's productions of American English /ɹ/ in an effort to develop an automated scoring algorithm for use in treatment targeting rhotics. Method: A series of ordinal mixed-effects regression models were fit over a large sample of children's productions of words containing /ɹ/ that had previously been rated by three trained clinicians. Akaike/Bayesian Information Criteria were used to select the best-fitting model. Result: Controlling for age, sex, and allophonic contextual differences, the measure that accounted for the most variance in speech rating was F3-F2 distance normalised relative to a sample of age- and sex-matched speakers. Conclusion: We recommend this acoustic measure for use in future automated scoring of children's production of American English rhotics. We also suggest that computer-based treatment with automated scoring should facilitate increases in treatment dosage by improving options for home practice.

Keywords: Human speech; biofeedback therapy; linear-mixed effects models; ordinal regression analysis; speech pathology; speech sound disorders.

Conflict of interest statement

Declaration of interest

No potential conflict of interest was reported by the authors.

Figures

Figure 1
Figure 1
Formant frequencies represented as peaks of an LPC spectral display, with line representing an accurate rhotic target, currently set at 1646 Hz. Incorrect /r/ (top panel) is characterised by a relatively high F3, while correct /r/ (bottom panel) is characterised by a relatively low F3. Images from the “staRt” app (McAllister Byun et al., 2017).
Figure 2
Figure 2
All models predicting mean perceptual rating of accuracy. All models included five structural variables. In addition to this base, one acoustic variable was added to each model. Each of the eight acoustic variables was run with either one of four interaction possibilities, for a total of 32 models.

Source: PubMed

3
Iratkozz fel