Speech Accessibility Project (SAP)

December 4, 2023 updated by: University of Illinois at Urbana-Champaign

People With Speech Disabilities Contributing Speech Samples for Improved Accessibility of Speech-Enabled Devices

The goal of the Speech Accessibility Project at the University of Illinois Beckman Institute (https://speechaccessibilityproject.beckman.illinois.edu) is to collect, annotate, and curate a shared database of speech samples from people with atypical speech, and share this data set with researchers at other organizations. This two-year project plans to collect 1,200,000 speech samples from 2,000 people, each of whom will provide 600 samples. In Year 1, the initial focus will be people with Parkinson's. In Year 2, four more etiologies of interest will be recruited: Amyotrophic Lateral Sclerosis (ALS), Cerebral Palsy (CP), Down Syndrome (DS), and Stroke. UIUC will build an open-source software infrastructure to collect annotated speech samples and share these data in an appropriately secure fashion with researchers from our partner technology companies (and eventually, other organizations as well) so that they can use these data to improve their automatic speech recognition algorithms. This project promotes diversity, equity, and inclusion by helping technology companies to fully support all types of speech, and it is also more efficient and less burdensome for these specialized patient populations to have one centralized "collector" of speech samples.

Study Overview

Detailed Description

The goal of our project is to collect 1,200,000 speech samples from 2,000 people with dysarthria, where we expect to collect data from 400 people each from five different patient populations. Each person would provide 600 speech samples.

(600 samples/person x 400 persons/etiology x 5 etiologies = 1,200,000 samples)

Our schedule of research procedures is:

  1. February or March-August 2023: data collection of speech samples from 400 people with Parkinson's.
  2. August 2023-August 2024: data collection of speech samples from 1,600 people with ALS, CP, DS, and Stroke.

Data collection of speech samples in Year 1 will be a collaboration of the University of Illinois and of mentors from Lee Silverman Voice Therapy (LSVT) Global. Potential participants will be screened both with a questionnaire and by providing a short set of "quality control" speech samples. If the participant does not pass screening, they will be thanked for their interest. Otherwise, the participant is eligible for the study and can do the informed consent process and then engage in contributing speech samples.

Participants can do as many recordings as they wish at whatever time of day is convenient for them. Participants will be able to login to the system at any time, 24/7.

In Year 2, this procedure will be performed with patients from other etiologies with additional advocacy organizations as partners.

Participants who are unable to read text from the computer screen will be offered the opportunity to record speech using a verbal-repetition protocol. In order to participate in the verbal repetition protocol, a participant must be accompanied by a caregiver who is also willing to be recorded. If a participant agrees to this protocol, then the caregiver will read each prompt to the participant. The participant will then repeat the words spoken by the caregiver, or respond to any question asked by the caregiver.

Participants also have the option to provide additional data about themselves, such as their age, race and ethnicity, and the year of their diagnosis. These "metadata tags" are completely optional but are helpful for analysis.

The collected speech samples will be stored securely in a custom database built by the UIUC Beckman Institute. All samples are stored with a unique participant ID code. All samples are annotated by our UIUC research team with technical information about the acoustic waveform and other information.

The entire database of speech samples will be shared with our coalition partners (Amazon, Apple, Google, Meta, and Microsoft), and, after all data collection is complete, with other universities and companies who are willing to sign our data use agreement. Each partner has signed a data use agreement with UIUC that allows these deidentified data to be used for improvements in speech recognition technology and assures the privacy of participants and confidentiality of data.

Study Type

Observational

Enrollment (Estimated)

2000

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Study Locations

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

No

Sampling Method

Non-Probability Sample

Study Population

People currently residing in the United States, excluding the states of Washington, Texas, or Illinois, with speech disability related to a neuromotor disorder.

Description

Inclusion Criteria:

  • Adult (age >= 18 years)
  • Self-reported diagnosis of Parkinson's Disease, ALS, CP, DS, or Stroke
  • Reads and speaks English in the form of complete sentences
  • Has a valid email address
  • Ability to access web browser to participate in study

Exclusion Criteria:

  • Is a resident of the State of Washington, Texas, or Illinois (because these states have privacy laws that would not allow us to collect 'voice prints')
  • If quality control screening of initial speech samples "fails" because of poor data quality (e.g., poor quality recording environment, or person's speech is "too typical" and not sufficiently interesting to continue collecting)

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Recorded Speech
Time Frame: 3-7 hours, self-paced
Each participant records 600 sentences: 480 read sentences, and 120 spontaneous sentences recorded in response to 30 prompts.
3-7 hours, self-paced

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Investigators

  • Principal Investigator: Mark A Hasegawa-Johnson, Ph.D., University of Illinois at Urbana-Champaign

Publications and helpful links

The person responsible for entering information about the study voluntarily provides these publications. These may be about anything related to the study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

March 15, 2023

Primary Completion (Estimated)

May 31, 2024

Study Completion (Estimated)

May 31, 2024

Study Registration Dates

First Submitted

May 24, 2023

First Submitted That Met QC Criteria

May 24, 2023

First Posted (Actual)

June 5, 2023

Study Record Updates

Last Update Posted (Estimated)

December 11, 2023

Last Update Submitted That Met QC Criteria

December 4, 2023

Last Verified

December 1, 2023

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

YES

IPD Plan Description

Participant speech data, and text transcripts created by University of Illinois annotators, will be distributed. Each data distribution will consist of a single archive file, compressed and encrypted using a secure key technology, so that it can only be decompressed by individual researchers who have been authorized under the terms of the data use agreement. Data distributions to coalition partners will never contain any information contributed by a participant other than their recorded speech samples.

IPD Sharing Time Frame

Data will be available to sponsoring organizations beginning in June 2023. Data will be available to other research organizations beginning in April 2024. Data will be available to other research organizations in perpetuity; a standing review committee will be established at the University of Illinois to review applications for access to this dataset at any time in the future.

IPD Sharing Access Criteria

Before any research institution is permitted to receive any speech audio recordings, they will be required to sign a data use agreement. Key terms of the data use agreement will include commitments that the research institution (a) is an organization or individual with the legal status necessary to sign a contract, (b) will not make the speech audio recordings available to any individual who is not bound by the member's signature on the data use agreement, (c) will store the data in a secure fashion to prevent data theft, and (d) will not seek to identify any of the participants.

IPD Sharing Supporting Information Type

  • ICF

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Parkinson Disease

3
Subscribe