Impact of GPT Use on Essay Writing Performance and Cognitive Abilities

July 4, 2025 updated by: Simiao Chen, University Hospital Heidelberg

A Randomized Controlled Trial on the Impact of Using Generative Artificial Intelligence on Analytical Writing Performance and Cognitive Abilities

The goal of this randomized controlled lab experiment is to examine if using generated artificial intelligence (AI) technology will affect people's academic performance and cognitive abilities in the context of analytical writing among college students. The main questions it aims to answer are:

  1. Does using the technology affect students' writing performance?
  2. Does using the technology affect students' cognitive effort during the writing process?

Participants will be randomly assigned to either a control group, which is writing without AI assistance, or an experimental group, which is writing with the assistance of ChatGPT. Researchers will compare the two groups to see if ChatGPT affects students' writing performance and cognitive effort.

For each participant, the lab experiment will last for no more than 1.5 hours. An eye-tracker will monitor the participant's gaze activities and pupil size. A functional near-infrared spectroscopy (fNIRS) will monitor the participant's brain activities in the frontal lobe. During the experiment, participants will be asked to:

  1. Read learning materials on analytical writing techniques.
  2. Based on the previously provided materials, complete an analytical writing assignment that will take approximately 30 minutes either with or without the aid of ChatGPT.
  3. Answer survey questions about their experience with the writing assignment, attitudes on using ChatGPT, and demographic backgrounds.

Study Overview

Status

Completed

Intervention / Treatment

Study Type

Interventional

Enrollment (Actual)

160

Phase

  • Not Applicable

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

      • Heidelberg, Germany
        • Core Facility for Neuroscience of Self-Regulation, Heidelberg University

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult

Accepts Healthy Volunteers

Yes

Description

Inclusion Criteria:

  • Full-time university student.
  • Able to read and write in English.
  • Use the computer most days of the week.
  • Have not taken, nor currently preparing for, the Graduate Record Examinations (GRE).
  • Do not wear glasses (contact lenses are allowed).
  • Have no eye impairment.
  • Not currently taking any opioids, epinephrine, or anti-hypertensive drugs.
  • During the experiment, not wearing any makeup around the eyes.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

  • Primary Purpose: Other
  • Allocation: Randomized
  • Interventional Model: Parallel Assignment
  • Masking: Single

Arms and Interventions

Participant Group / Arm
Intervention / Treatment
Experimental: Intervention arm
In the intervention arm, participants are instructed to use ChatGPT for assistance to complete an analytical writing task.
The computer interface used for the essay writing task follows a split-screen design. The writing instructions and text input field are administered on a survey platform, placed on the left half of the screen. ChatGPT is placed on the right half of the screen for technology assistance.
No Intervention: Control arm
In the control arm, participants are instructed to complete an analytical writing task independently without access to any technology assistance.

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Writing Performance
Time Frame: 1.5 hours
The essay writing task is derived from the Analytical Writing section in the Graduate Record Examinations (GRE), which is a worldwide and standardized computer-based exam developed by the Educational Testing Service (ETS). The participants' essays will be scored on a scale from 0 to 6 by an automatic and validated third-party scoring tool that is also developed by ETS.
1.5 hours
Cognitive Effort Measured by Pupil Size
Time Frame: 1.5 hours
Cognitive effort is quantified by monitoring changes in pupil size. To achieve this, pupil diameters are recorded throughout the writing task using a near-infrared eye tracker, specifically the Tobii Pro Fusion model. At the start of the experiment, individual baseline pupil diameters are measured during a 30-second relaxation task.
1.5 hours

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Self-Perception of Writing Performance
Time Frame: 1.5 hours

This is a one-item scale:

Using the same grading rubric from before, what score do you think your essay should get (0 being the lowest and 6 being the highest)?

The score ranges from 0 to 6. A higher score indicates higher self-perceived writing performance. The variable is treated as a continuous variable.

1.5 hours
Self-Perception of Cognitive Effort
Time Frame: 1.5 hours

This is a one-item Likert scale adapted from the National Aeronautics and Space Administration-task load index (NASA-TLX; Hart, 2006; Hart & Staveland, 1988):

On a scale of 1 to 7, rate how hard you have to work to accomplish your level of performance.

The Likert score ranges from 1 to 7 (1 being "very low" and 7 being "very high"). A higher score indicates higher self-perceived cognitive effort. The variable is treated as a continuous variable.

References:

  1. Hart, S. G. (2006). Nasa-Task Load Index (NASA-TLX); 20 Years Later. Proceedings of the Human Factors and Ergonomics Society Annual Meeting. https://doi.org/10.1177/154193120605000909
  2. Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in Psychology (Vol. 52, pp. 139-183). North-Holland.
1.5 hours
Cognitive Effort Measured by Cortical Hemodynamic Activity in the Frontal Lobe
Time Frame: 1.5 hours
Cognitive Effort is quantified by monitoring changes in the cortical hemodynamic activity in the frontal lobe. To achieve this, the brain activity is recorded throughout the writing task using a functional near-infrared spectroscopy (fNIRS), specifically the NIRSport2 model.
1.5 hours
Self-Perception of Stress
Time Frame: 1.5 hours

This is a one-item Likert sub-scale adapted from the Primary Appraisal Secondary Appraisal scale (PASA; Gaab, 2009; Pollak et al., 2020):

On a scale of 1 to 7, how much would you agree or disagree with the following statement on perceived stress: The analytical writing assignment was stressful to me.

The Likert score ranges from 1 to 7 (1 being "strongly disagree" and 7 being "strongly agree"). A higher score indicates higher self-perceived stress. The variable is treated as a continuous variable.

References:

  1. Gaab, J. (2009). PASA-Primary Appraisal Secondary Appraisal. A questionnaire for the assessment of cognitive appraisals of situations. Verhaltenstherapie, 19(2), 114-115.
  2. Pollak, A., Paliga, M., Pulopulos, M. M., Kozusznik, B., & Kozusznik, M. W. (2020). Stress in manual and autonomous modes of collaboration with a cobot. Computers in Human Behavior, 112, 106469. https://doi.org/10.1016/j.chb.2020.106469
1.5 hours
Self-Perception of Challenge
Time Frame: 1.5 hours

This is a one-item Likert sub-scale adapted from the Primary Appraisal Secondary Appraisal scale (PASA; Gaab, 2009; Pollak et al., 2020):

On a scale of 1 to 7, how much would you agree or disagree with the following statement on perceived challenge: I find the analytical writing assignment a challenge.

The Likert score ranges from 1 to 7 (1 being "strongly disagree" and 7 being "strongly agree"). A higher score indicates higher self-perceived challenge. The variable is treated as a continuous variable.

References:

  1. Gaab, J. (2009). PASA-Primary Appraisal Secondary Appraisal. A questionnaire for the assessment of cognitive appraisals of situations. Verhaltenstherapie, 19(2), 114-115.
  2. Pollak, A., Paliga, M., Pulopulos, M. M., Kozusznik, B., & Kozusznik, M. W. (2020). Stress in manual and autonomous modes of collaboration with a cobot. Computers in Human Behavior, 112, 106469. https://doi.org/10.1016/j.chb.2020.106469
1.5 hours
Self-Efficacy in Writing
Time Frame: 1.5 hours

This is a sixteen-item Likert scale that measures three dimensions of writing self-efficacy: ideation, convention and self-regulation (Bruning et al., 2013). The Likert score ranges from 1 to 7 (1 being "strongly disagree" and 7 being "strongly agree"). A higher score indicates higher self-efficacy. The three dimensions will be treated separately, each as a continuous variable.

Reference:

1. Bruning, R., Dempsey, M., Kauffman, D. F., McKim, C., & Zumbrunn, S. (2013). Examining dimensions of self-efficacy for writing. Journal of educational psychology, 105(1), 25.

1.5 hours
Situational Interest in Analytical Writing
Time Frame: 1.5 hours

This is a four-item Likert scale adapted from the situational interest scale (Hulleman et al., 2010). This scale measures participants' situational interest in analytical writing:

On a scale of 1 to 7, how much would you agree or disagree with the following statements on your interest in the analytical writing assignment that you just completed?

  1. The analytical writing assignment was interesting.
  2. Working on the essay was fun.
  3. I enjoyed writing the essay.
  4. The analytical writing assignment was enjoyable.

The Likert score ranges from 1 to 7 (1 being "strongly disagree" and 7 being "strongly agree"). A higher score indicates higher situational interest. The variable is treated as a continuous variable.

Reference:

1. Hulleman, C. S., Godes, O., Hendricks, B. L., & Harackiewicz, J. M. (2010). Enhancing interest and performance with a utility value intervention. Journal of Educational Psychology, 102(4), 880.

1.5 hours
Behavioral Intention in Using ChatGPT
Time Frame: 1.5 hours

This is a two-item Likert scale that measures participants' behavioral intention in using ChatGPT in the future for essay writing tasks (Albayati, 2024):

On a scale of 1 to 7, how much would you agree or disagree with the following statements on using ChatGPT in essay writing assignments?

  1. If I have access to ChatGPT, I would use it for essay writing tasks.
  2. I plan to use ChatGPT in the future if I have an essay writing task.

The Likert score ranges from 1 to 7 (1 being "strongly disagree" and 7 being "strongly agree"). A higher score indicates higher behavioral intention in using ChatGPT. The variable is treated as a continuous variable.

Reference:

1. Albayati, H. (2024). Investigating undergraduate students' perceptions and awareness of using ChatGPT as a regular assistance tool: A user acceptance perspective study. Computers and Education: Artificial Intelligence, 6, 100203. https://doi.org/10.1016/j.caeai.2024.100203

1.5 hours

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Investigators

  • Principal Investigator: Till Bärnighausen, Heidelberg Institute of Global Health

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

July 18, 2024

Primary Completion (Actual)

February 20, 2025

Study Completion (Actual)

February 20, 2025

Study Registration Dates

First Submitted

July 15, 2024

First Submitted That Met QC Criteria

July 15, 2024

First Posted (Actual)

July 19, 2024

Study Record Updates

Last Update Posted (Actual)

July 9, 2025

Last Update Submitted That Met QC Criteria

July 4, 2025

Last Verified

July 1, 2025

More Information

Terms related to this study

Other Study ID Numbers

  • S-117/2024

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

product manufactured in and exported from the U.S.

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Cognitive Change

Clinical Trials on GPT Support

Subscribe