Deriving gradient measures of child speech from crowdsourced ratings

Tara McAllister Byun, Daphna Harel, Peter F Halpin, Daniel Szeredi, Tara McAllister Byun, Daphna Harel, Peter F Halpin, Daniel Szeredi

Abstract

Recent research has demonstrated that perceptual ratings aggregated across multiple non-expert listeners can reveal gradient degrees of contrast between sounds that listeners might transcribe identically. Aggregated ratings have been found to correlate strongly with acoustic gold standard measures both when individual raters use a continuous rating scale such as visual analog scaling (Munson et al., 2012) and when individual raters provide binary ratings (McAllister Byun, Halpin, & Szeredi, 2015). In light of evidence that inexperienced listeners use continuous scales less consistently than experienced listeners, this study investigated the relative merits of binary versus continuous rating scales when aggregating responses over large numbers of naive listeners recruited through online crowdsourcing. Stimuli were words produced by children in treatment for misarticulation of North American English /r/. Each listener rated the same 40 tokens two times: once using Visual Analog Scaling (VAS) and once using a binary rating scale. The gradient rhoticity of each item was then estimated using (a) VAS click location, averaged across raters; (b) the proportion of raters who assigned the "correct /r/" label to each item in the binary rating task (pˆ). First, we validate these two measures of rhoticity against each other and against an acoustic gold standard. Second, we explore the range of variability in individual response patterns that underlie these group-level data. Third, we integrate statistical, theoretical, and practical considerations to offer guidelines for determining which measure to use in a given situation.

Keywords: Covert contrast; Crowdsourcing; Research methods; Speech perception; Speech rating; Speech sound disorders; Visual analog scaling.

Copyright © 2016 Elsevier Inc. All rights reserved.

Figures

Figure A1
Figure A1
Screenshot of VAS rating interface.
FIGURE 1
FIGURE 1
Correlation between (a) p̂ and mean VAS click location, (b) mean VAS click location and F3-F2 Distance, (c) p̂ and F3-F2 Distance.
FIGURE 2
FIGURE 2
Histogram of rater bias measure (proportion of tokens rated “correct /r/”) for the binary response task.
FIGURE 3
FIGURE 3
Density plots of VAS click locations for three selected raters.
FIGURE 4
FIGURE 4
Correlation between binary-acoustic agreement (difference in mean F3-F2 distance between sets of tokens rated “correct /r/” vs “not a correct /r/”) and VAS-acoustic agreement (proportion of variation in VAS ratings explained by F3-F2 distance), across individuals.
FIGURE 5
FIGURE 5
VAS click locations compared to F3-F2 values for three individual raters representing (a) higher performance on VAS and (b) higher categoricity and (c) overall lower performance.

Source: PubMed

3
Suscribir