Social Media as a Risk Tool for HIV Prevention Needs (SMaaRT)

November 8, 2025 updated by: Jessica Haberer, MD, Massachusetts General Hospital

Use of Sentiment Analysis and Social Media to Understand HIV Prevention Needs Among Young Women in Kenya

The impact of effective HIV prevention tools is limited because many people do not know that they are at risk for HIV acquisition, despite the availability of various risk assessment scores and criteria. This proposal aims to use a novel data science approach to assessing HIV prevention needs among 400 young women in Kisumu, Kenya- namely, topic modeling and network analysis of text and/or social media messages (e.g., WhatsApp, Instagram, Twitter). The study will involve in-depth assessment of relevant ethical and logistical factors to ensure appropriate and optimized use of a sentiment analysis tool for implementation in routine clinical care.

Study Overview

Status

Completed

Detailed Description

In the Social Media as a Risk Tool (SMaaRT) Study, the investigators hypothesize that topic modeling of SMS/social media data combined with network analysis among young women in Kenya will correlate well with existing HIV risk scales and ultimately yield a better understanding of HIV prevention needs. The investigators propose the following aims:

  1. Explore ethical factors that may influence analysis of SMS and social media messages. Research assistants will conduct individual qualitative interviews with up to 32 young women (16 who would and 16 who would not provide SMS/social media data, stratified among four clinic sites) and one focus group of five Kenyan bioethicists. Questions will explore ethical concerns from individual and bystander (e.g., contacts involved in SMS/social media) perspectives and differences in ethical issues by type of social media (e.g., conversations vs posts). Follow-up interviews will be conducted with the women who provide SMS and/or social media data (in Aim 2).
  2. Conduct topic modeling and network analysis of SMS and social media messages to predict HIV prevention needs among young women in Kenya. Working with four clinical sites in Kisumu, study staff will ask approximately 400 women (ages 18-24) seeking HIV testing, PrEP, and other health services to download six months of SMS/social media messages (e.g., WhatsApp, Instagram, Twitter) as a one-time procedure. For those providing data, research assistants will assess social networks engaged via SMS/social media (e.g., anonymously labeled as peers, sexual partners), administer multiple HIV risk assessments (e.g., VOICE, Wand risk scores), and obtain HIV test results. Data analysts will use automated structural topic modelling to determine "topics" (word clusters) and assess for association with other risk assessments (primary outcome) and HIV test results (exploratory outcome), and will also evaluate the impact of social networks, SMS/social media type, data volume, and language type on outcomes. Data collection and analysis will conform to Aim 1 findings.
  3. Assess practical factors that may influence use of a sentiment analysis tool in routine care. In a needs assessment based on Implementation Mapping, research assistants will conduct four focus groups with five staff per clinic and two focus groups with five young women each to explore staffing best suited to implement a sentiment analysis tool and how it could be best integrated into routine care. The investigators will also assess available resources to determine optimal efficiency in developing a preliminary implementation strategy.

Study Type

Observational

Enrollment (Actual)

400

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

      • Kisumu, Kenya
        • KEMRI

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult

Accepts Healthy Volunteers

Yes

Sampling Method

Non-Probability Sample

Study Population

Research assistants will recruit young Kenyan women (ages 18-24), attending one of four clinics for any health services. Smart phone ownership and use of SMS, WhatsApp, or other social media is required.

Description

Inclusion Criteria:

  • Identifying as a young woman (age 18-24 years)
  • Attending clinic for any health services, including PrEP and HIV testing
  • Smart phone ownership
  • Ability to understand Kiswahili, DhoLuo, and/or English
  • Use of SMS, WhatsApp, and/or other types of social media

Exclusion Criteria:

• Inability to provide informed consent (e.g., intoxication, developmental delay)

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Association of artificial intelligence measure datasets with the VOICE risk score
Time Frame: 6 months

Analysts will examine 6 months of SMS/social media message content from each of the 400 study participants using three computational linguistic methods: 1) sentiment, valence, and arousal analysis; 2) topic modeling; 3) simple textual counts. Analysts will also perform network analysis with up to 20 contacts from each participant to understand how often and with which parties the participant communicates most frequently. These networks will be examined temporally to see if any of the connections have grown or weakened over time.

From these analyses, the investigators will generate multiple measure datasets to compare with the VOICE risk score (i.e., a combined assessment of HIV risk based on age, marital status, sexual partner support, sexual partner sexual behavior, and alcohol use), as assessed in the study participants at the time of SMS/social media data collection.

6 months
Association of artificial intelligence measure datasets with the Wand risk score
Time Frame: One day
The investigators will compare the above-noted measure datasets with the Wand risk score (i.e., a combined assessment of HIV risk based on age, marital status, age at sexual debut, number of sexual partners, use of injectable contraception, and history of sexually transmitted infections), as assessed in the study participants at the time of SMS/social media data collection.
One day

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Association of artificial intelligence measure datasets with HIV test results
Time Frame: One day
The investigators will compare the above-noted measure datasets with the HIV test results obtained from the study participants at the time of data collection.
One day

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

February 1, 2024

Primary Completion (Actual)

May 15, 2025

Study Completion (Actual)

July 20, 2025

Study Registration Dates

First Submitted

July 30, 2024

First Submitted That Met QC Criteria

August 19, 2024

First Posted (Actual)

August 22, 2024

Study Record Updates

Last Update Posted (Actual)

November 12, 2025

Last Update Submitted That Met QC Criteria

November 8, 2025

Last Verified

November 1, 2025

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

YES

IPD Plan Description

This project will release and share final de-identified research data and materials from NIH-supported research for use by other researchers in a timely manner. Due to the sensitive nature of the SMS/social media messages, that data will be deleted at the conclusion of the study.

IPD Sharing Time Frame

We will make this data available after publishing our findings.

IPD Sharing Access Criteria

We will post a de-identified dataset to the Harvard Dataverse, a datasharing platform.

IPD Sharing Supporting Information Type

  • STUDY_PROTOCOL
  • SAP
  • ICF

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on HIV Prevention

Subscribe