Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users

Georgios Mantokoudis, Claudia Dähler, Patrick Dubach, Martin Kompis, Marco D Caversaccio, Pascal Senn, Georgios Mantokoudis, Claudia Dähler, Patrick Dubach, Martin Kompis, Marco D Caversaccio, Pascal Senn

Abstract

Objective: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users.

Methods: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0-500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed.

Results: Higher frame rate (>7 fps), higher camera resolution (>640 × 480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032).

Conclusion: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

Conflict of interest statement

Competing Interests: The company Logitech Europe S.A. provided commercially available products for this study. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials.

Figures

Figure 1. Boxplots demonstrating lower quartile, median,…
Figure 1. Boxplots demonstrating lower quartile, median, and upper quartile, and whiskers representing 1.5 times the interquartile range (X = outliers): Speech reading performance (correctly-repeated words in percent) from 14 deaf individuals by using (A) the same high definition web camera (Logitech Pro9000) and different speakers (CD, medical student, 97 words/s; JB, actress, 161 words/s; SF, speech therapist, 178 words/s), (B) the same speaker (CD) but different communication modes and (C) the same speaker (SF) with 3 different webcams: Logitech Pro9000, Logitech C600, and Logitech C500.
Figure 2. Speech reading performance (mean +/−1…
Figure 2. Speech reading performance (mean +/−1 SD) by n = 14 deaf individuals for 4 different spatial resolutions (A) and 5 different frame rates (B).
In B, the maximum achieved speech perception at 30 fps is set to 100% (relative data). Mean speech perception scores remained above 80% until the frame rate of 10 images per second. Frame rates

Figure 3. Speech reading capability of cochlear…

Figure 3. Speech reading capability of cochlear implant users.

A. Comparison of speech perception scores…

Figure 3. Speech reading capability of cochlear implant users.
A. Comparison of speech perception scores in the absence of auditory input for n = 10 proficient (pCI) and n = 11 non-proficient (npCI) CI users for two visual communication modes (face-to-face without their implant activated vs. Skype™ video only). B. Boxplots showing speech reading scores for each condition and group.

Figure 4. CI-users and audio-visual gain for…

Figure 4. CI-users and audio-visual gain for Skype™ transmission.

A. Speech perception scores of n…

Figure 4. CI-users and audio-visual gain for Skype™ transmission.
A. Speech perception scores of n = 10 proficient (pCI) and n = 11 non-proficient (npCI) CI users for exclusive auditory input vs. audio-visual input. B. Non-proficient CI users and the two groups combined (all CI) showed a statistically significant audio-visual gain (Boxplots). Proficient CI users showed a non-significant trend for AV-gain.

Figure 5. Audiovisual delay.

Bimodal mean speech…

Figure 5. Audiovisual delay.

Bimodal mean speech perception (+/−1 SD) is plotted against audio-visual delay…

Figure 5. Audiovisual delay.
Bimodal mean speech perception (+/−1 SD) is plotted against audio-visual delay (auditory signal proceeds image) for n = 10 proficient (pCI) and n = 11 non-proficient (npCI) CI users. Fusion of incongruent auditory and visual stimuli is not possible after 200 ms for npCI and 300 ms for pCI users. Intelligibility improved again after long AV delays because CI users did not try to fuse both incongruent signals and relied on either one of the stimuli.
Figure 3. Speech reading capability of cochlear…
Figure 3. Speech reading capability of cochlear implant users.
A. Comparison of speech perception scores in the absence of auditory input for n = 10 proficient (pCI) and n = 11 non-proficient (npCI) CI users for two visual communication modes (face-to-face without their implant activated vs. Skype™ video only). B. Boxplots showing speech reading scores for each condition and group.
Figure 4. CI-users and audio-visual gain for…
Figure 4. CI-users and audio-visual gain for Skype™ transmission.
A. Speech perception scores of n = 10 proficient (pCI) and n = 11 non-proficient (npCI) CI users for exclusive auditory input vs. audio-visual input. B. Non-proficient CI users and the two groups combined (all CI) showed a statistically significant audio-visual gain (Boxplots). Proficient CI users showed a non-significant trend for AV-gain.
Figure 5. Audiovisual delay.
Figure 5. Audiovisual delay.
Bimodal mean speech perception (+/−1 SD) is plotted against audio-visual delay (auditory signal proceeds image) for n = 10 proficient (pCI) and n = 11 non-proficient (npCI) CI users. Fusion of incongruent auditory and visual stimuli is not possible after 200 ms for npCI and 300 ms for pCI users. Intelligibility improved again after long AV delays because CI users did not try to fuse both incongruent signals and relied on either one of the stimuli.

References

    1. Hellstrom Delevert, Revelius (1997) Quality requirements on Videotelephony for Sign Language; Stockholm, Sweden.
    1. Bowe FG (2002) Deaf and hard of hearing Americans’ instant messaging and e-mail use: a national survey. Am Ann Deaf 147: 6–10.
    1. Dalton DS, Cruickshanks KJ, Klein BE, Klein R, Wiley TL, et al. (2003) The impact of hearing loss on quality of life in older adults. Gerontologist 43: 661–668.
    1. Barnett S, Franks P (1999) Deafness and mortality: analyses of linked data from the National Health Interview Survey and National Death Index. Public Health Rep 114: 330–336.
    1. Mantokoudis G, Kompis M, Dubach P, Caversaccio M, Senn P (2010) How internet telephony could improve communication for hearing-impaired individuals. Otol Neurotol 31: 1014–1021.
    1. Mantokoudis G, Dubach P, Pfiffner F, Kompis M, Caversaccio M, et al. (2012) Speech perception benefits of internet versus conventional telephony for hearing-impaired individuals. J Med Internet Res 14: e102.
    1. Bergeson TR, Pisoni DB, Davis RA (2005) Development of audiovisual comprehension skills in prelingually deaf children with cochlear implants. Ear Hear 26: 149–164.
    1. Strelnikov K, Rouger J, Barone P, Deguine O (2009) Role of speechreading in audiovisual interactions during the recovery of speech comprehension in deaf adults with cochlear implants. Scand J Psychol 50: 437–444.
    1. Rouger J, Lagleyre S, Fraysse B, Deneve S, Deguine O, et al. (2007) Evidence that cochlear-implanted deaf patients are better multisensory integrators. Proc Natl Acad Sci U S A 104: 7295–7300.
    1. Rouger J, Fraysse B, Deguine O, Barone P (2008) McGurk effects in cochlear-implanted deaf subjects. Brain Res 1188: 87–99.
    1. Tremblay C, Champoux F, Lepore F, Theoret H (2010) Audiovisual fusion and cochlear implant proficiency. Restor Neurol Neurosci 28: 283–291.
    1. Farrera MP, Fleury M, Guild K, Ghanbari M (2010) Measurement and analysis Study of Congestion Detection for Internet Video Streaming. Journal of Communications 5: 169–177.
    1. Baskent D, Bazo D (2011) Audiovisual Asynchrony Detection and Speech Intelligibility in Noise With Moderate to Severe Sensorineural Hearing Impairment. Ear Hear 32: 582–592.
    1. Pavel M, Sperling G, Riedl T, Vanderbeek A (1987) Limits of visual communication: the effect of signal-to-noise ratio on the intelligibility of American Sign Language. J Opt Soc Am A 4: 2355–2365.
    1. Hochmair-Desoyer I, Schulz E, Moser L, Schmidt M (1997) The HSM sentence test as a tool for evaluating the speech understanding in noise of cochlear implant users. Am J Otol 18: S83.
    1. ITU-T Handbook on Telephonometry (1992). Geneva: International Telecommunication Union.
    1. Lenarz M, Sonmez H, Joseph G, Buchner A, Lenarz T (2012) Effect of gender on the hearing performance of adult cochlear implant patients. Laryngoscope 122: 1126–1129.
    1. Lenarz M, Sonmez H, Joseph G, Buchner A, Lenarz T (2012) Long-term performance of cochlear implants in postlingually deafened adults. Otolaryngol Head Neck Surg 147: 112–118.
    1. Wu YH, Bentler RA (2010) Impact of visual cues on directional benefit and preference: Part I–laboratory tests. Ear Hear 31: 22–34.
    1. Wu YH, Bentler RA (2010) Impact of visual cues on directional benefit and preference: Part II–field tests. Ear Hear 31: 35–46.
    1. Erber NP (1969) Interaction of audition and vision in the recognition of oral speech stimuli. J Speech Hear Res 12: 423–425.
    1. Frowein HW, Smorrenburg GF, Pyters L, Schinkel D (1991) Improved speech recognition through videotelephony: experiments with the hard of hearing. IEEE Journal on Selected Areas in Communications 9: 611–616.
    1. Ross LA, Saint-Amour D, Leavitt VM, Javitt DC, Foxe JJ (2007) Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cereb Cortex 17: 1147–1153.
    1. Alegria J, Lechat J (2005) Phonological processing in deaf children: when lipreading and cues are incongruent. J Deaf Stud Deaf Educ 10: 122–133.
    1. Luettin J. Towards Speaker Independent Continuous Speechreading; 1997 22–25 September; Rhodes, Greece. EUROSPEECH. 1991–1994.
    1. Bavelier D, Dye MW, Hauser PC (2006) Do deaf individuals see better? Trends Cogn Sci 10: 512–518.
    1. Foulds RA (2004) Biomechanical and perceptual constraints on the bandwidth requirements of sign language. IEEE Trans Neural Syst Rehabil Eng 12: 65–72.
    1. ITU-T Recommendations (1999) H-Series: Application profile - Sign language and lip-reading real-time conversation using low bit rate video communication. International Telecommunication Union.
    1. De Cicco L, Mascolo S, Palmisano V. Skype Video Responsiveness to Bandwidth Variations; 2008 28/05/2008; Braunschweig, Germany.
    1. Cavender A, Vanam R, Barney DK, Ladner RE, Riskin EA (2008) MobileASL: intelligibility of sign language video over mobile phones. Disabil Rehabil Assist Technol 3: 93–105.
    1. Ciaramello F, Hemami S (2011) A computational intelligibility model for assessment and compression of American Sign Language video. IEEE Trans Image Process 20: 3014–3027.
    1. Saxe DM, Foulds RA (2002) Robust region of interest coding for improved sign language telecommunication. IEEE Trans Inf Technol Biomed 6: 310–316.
    1. Skoglund J, Kozica E, Linden J, Hagen R, Kleijn WB (2008) Voice over IP: Speech Transmission over Packet Networks. Handbook of Speech processing. Heidelberg: Springer. 307–330.
    1. ITU-T Recommendations (2009) H-Series: Packet-based multimedia communications systems. International Telecommunication Union.
    1. Tye-Murray N, Sommers MS, Spehar B (2007) Audiovisual integration and lipreading abilities of older adults with normal and impaired hearing. Ear Hear 28: 656–668.
    1. Champoux F, Lepore F, Gagne JP, Theoret H (2009) Visual stimuli can impair auditory processing in cochlear implant users. Neuropsychologia 47: 17–22.
    1. Summerfield Q (1992) Lipreading and audio-visual speech perception. Philos Trans R Soc Lond B Biol Sci 335: 71–78.
    1. Colonius H, Diederich A (2011) Computing an optimal time window of audiovisual integration in focused attention tasks: illustrated by studies on effect of age and prior knowledge. Exp Brain Res.

Source: PubMed

3
Subscribe