How one block of trials influences the next: persistent effects of disease prevalence and feedback on decisions about images of skin lesions in a large online study

Jeremy M Wolfe, Jeremy M Wolfe

Abstract

Using an online, medical image labeling app, 803 individuals rated images of skin lesions as either "melanoma" (skin cancer) or "nevus" (a skin mole). Each block consisted of 80 images. Blocks could have high (50%) or low (20%) target prevalence and could provide full, accurate feedback or no feedback. As in prior work, with feedback, decision criteria were more conservative at low prevalence than at high prevalence and resulted in more miss errors. Without feedback, this low prevalence effect was reversed (albeit, not significantly). Participants could participate in up to four different conditions a day on each of 6 days. Our main interest was in the effect of Block N on Block N + 1. Low prevalence with feedback made participants more conservative on a subsequent block. High prevalence with feedback made participants more liberal on a subsequent block. Conditions with no feedback had no significant impact on the subsequent block. The delay between Blocks 1 and 2 had no significant effect. The effect on the second half of Block 2 was just as large as on the first half. Medical expertise (over the range available in the study) had no impact on these effects, though medical students were better at the task than other groups. Overall, these seem to be robust effects where feedback may be 'teaching' participants how to respond in the future. This might have application in, for example, training or re-training situations.

Conflict of interest statement

The authors declare that they have no competing interests.

© 2022. The Author(s).

Figures

Fig. 1
Fig. 1
Sample stimulus displays. Left: participants are asked if a lesion is a melanoma (cancer) or a nevus (benign ‘mole’). Right: after response, if this were a feedback condition, red or green feedback would inform the participant of the correctness of the answer. In no feedback conditions, neutral feedback indicated that the response had been registered
Fig. 2
Fig. 2
D′ and criterion as a function of condition. Each dot represents one of 2080 blocks of data. Black lines show mean and ± 95% CI of the mean
Fig. 3
Fig. 3
Median response time as a function of block type. RTs A RT distributions with the graph truncated at 3 s for display purposes. Black lines show the mean. B Means of the median RTs on a much finer scale. Error bars show ± 95% confidence intervals
Fig. 4
Fig. 4
Change in d′ on block 2 as a function of the nature of block 1. p values show results for simple t-tests, testing against the null hypothesis that there is no change in d′. Black lines show means ± 95% confidence intervals
Fig. 5
Fig. 5
Change in criterion on block 2 as a function of the nature of block 1. P values show results for simple t-tests, testing against the null hypothesis that there is no change in criterion. Black lines show means ± 95% confidence intervals
Fig. 6
Fig. 6
Change in criterion as a function of time between the start of block 1 and start of block 2. 1440 min = 1 day. Colored lines are best-fit linear regressions for each of 16 pairs of block types. A All data, B Delay < 1 day, C Delay < 2 h, D Delay < 15 min. Repeat pairs (e.g., Low Feedback → Low Feedback) do not occur for shorter delays (C, D)
Fig. 7
Fig. 7
Change in criterion on block 2 as a function of the nature of block 1. Data restricted to pairs separated by more than 2 h
Fig. 8
Fig. 8
Mean change in criterion for the first and second halves of a block of trials
Fig. 9
Fig. 9
Effects of expertise category on d′ and crit, regardless of prevalence and/or feedback
Fig. 10
Fig. 10
Hypothetical curves showing the results of the feedback and no feedback conditions as two points on continuous functions

References

    1. Codella, N., Rotemberg, V., Tschandl, P., Celebi, M. E., Dusza, S., Gutman, D., et al. (2019). Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the International Skin Imaging Collaboration (ISIC). arXiv:1902.03368.
    1. Cox PH, Kravitz DJ, Mitroff SR. Great expectations: Minor differences in initial instructions have a major impact on visual search in the absence of feedback. Cognitive Research: Principles and Implications (CRPI) 2021;6(1):19. doi: 10.1186/s41235-021-00286-1.
    1. Evans KK, Birdwell RL, Wolfe JM. If you don’t find it often, you often don’t find it: Why some cancers are missed in breast cancer screening. PLoS ONE. 2013;8(5):366. doi: 10.1371/journal.pone.0064366.
    1. Evans KK, Tambouret R, Wilbur DC, Evered A, Wolfe JM. Prevalence of abnormalities influences cytologists' error rates in screening for cervical cancer. Archives of Pathology & Laboratory Medicine. 2011;135(12):1557–1560. doi: 10.5858/arpa.2010-0739-OA.
    1. Evered A. The prevalence problem in the era of human papillomavirus screening. Cytopathology. 2017;29:97–99. doi: 10.1111/cyt.12488.
    1. Fischer J, Whitney D. Serial dependence in visual perception. Nature Neuroscience. 2014;17(5):738–743. doi: 10.1038/nn.3689.
    1. Gekas N, McDermott KC, Mamassian P. Disambiguating serial effects of multiple timescales. Journal of Vision. 2019;19(6):24–24. doi: 10.1167/19.6.24.
    1. Growns B, Kukucka J. The prevalence effect in fingerprint identification: Match and non-match base-rates impact misses and false alarms. Applied Cognitive Psychology. 2021;35(3):751–760. doi: 10.1002/acp.3800.
    1. Han S, Dobbins IG. Examining recognition criterion rigidity during testing using a biased-feedback technique: Evidence for adaptive criterion learning. Memory & Cognition. 2008;36(4):703–715. doi: 10.3758/MC.36.4.703.
    1. Hautus MJ, Macmillan NA, Creelman CD. Detection theory. Routledge; 2021.
    1. Helson H. Adaptation-level theory. Harper and Row; 1964.
    1. Horowitz TS. Prevalence in visual search: From the clinic to the lab and back again. Japanese Psychological Research. 2017;59(2):65–108. doi: 10.1111/jpr.12153.
    1. Jackson SL, Cook AJ, Miglioretti DL, Carney PA, Geller BM, Onega T, et al. Are radiologists' goals for mammography accuracy consistent with published recommendations? Academic Radiology. 2012;19(3):289–295. doi: 10.1016/j.acra.2011.10.013.
    1. Levari DE, Gilbert DT, Wilson TD, Sievers B, Amodio DM, Wheatley T. Prevalence-induced concept change in human judgment. Science. 2018;360(6396):1465–1467. doi: 10.1126/science.aap8731.
    1. Littlefair S, Brennan P, Mello-Thoms C, Dung P, Pietryzk M, Talanow R, et al. Outcomes knowledge may bias radiological decision-making. Academic Radiology. 2016;23(6):760–767. doi: 10.1016/j.acra.2016.01.006.
    1. Lyu W, Levari DE, Nartker M, Little DS, Wolfe JM. Feedback moderates the effect of prevalence on perceptual decisions. Psychonomic Bulletin & Review, on Line. 2021 doi: 10.3758/s13423-021-01956-3.
    1. Mackworth JF. The effect of true and false knowledge of results on the detectability of signals in a vigilance task. Canadian Journal of Psychology. 1964;18:106–117. doi: 10.1037/h0083493.
    1. Manassi M, Ghirardo C, Canas-Banjo T, Ren Z, Prinzmetal W, Whitney D. Serial dependence in the perceptual judgments of radiologists. Cognitive Research: Principles and Implications (CRPI) 2021;6:65. doi: 10.1186/s41235-021-00331-z.
    1. Mitroff SR, Ericson JM, Sharpe B. Predicting airport screening officers’ visual search competency with a rapid assessment. Human Factors. 2017;60(2):201–211. doi: 10.1177/0018720817743886.
    1. Papesh MH, Heisick LL, Warner KA. The persistent low-prevalence effect in unfamiliar face-matching: The roles of feedback and criterion shifting. Journal of Experimental Psychology: Applied. 2018;24(3):416–430. doi: 10.1037/xap0000156.
    1. Reed WM, Ryan JT, McEntee MF, Evanoff MG, Brennan PC. The effect of abnormality-prevalence expectation on expert observer performance and visual search. Radiology. 2011;258(3):938–943. doi: 10.1148/radiol.10101090.
    1. Reed WM, Chow SL, Chew LE, Brennan PC. Assessing the impact of prevalence expectations on radiologists' behavior. Academic radiology. 2014;21(9):1220–1221. doi: 10.1016/j.acra.2014.06.001.
    1. Schwark J, Sandry J, MacDonald J, Dolgov I. False feedback increases detection of low prevalence targets in visual search. Attention, Perception, & Psychophysics. 2012;74(8):1583–1589. doi: 10.3758/s13414-012-0354-4.
    1. Trueblood JS, Eichbaum Q, Seegmiller AC, Stratton C, O'Daniels P, Holmes WR. Disentangling prevalence induced biases in medical image decision-making. Cognition. 2021;212:104713. doi: 10.1016/j.cognition.2021.104713.
    1. Weatherford DR, Erickson WB, Thomas J, Walker ME, Schein B. You shall not pass: How facial variability and feedback affect the detection of low-prevalence fake IDs. Cognitive Research: Principles and Implications. 2020;5(1):3. doi: 10.1186/s41235-019-0204-1.
    1. Wolfe JM, Brunelli DN, Rubinstein J, Horowitz TS. Prevalence effects in newly trained airport checkpoint screeners: Trained observers miss rare targets, too. Journal of Vision. 2013;13(3):33. doi: 10.1167/13.3.33.
    1. Wolfe JM, Horowitz TS, Kenner NM. Rare targets are often missed in visual search. Nature. 2005;435(7041):439–440. doi: 10.1038/435439a.
    1. Wolfe JM, Horowitz TS, VanWert MJ, Kenner NM, Place SS, Kibbi N. Low target prevalence is a stubborn source of errors in visual search tasks. Journal of Experimental Psychology-General. 2007;136(4):623–638. doi: 10.1037/0096-3445.136.4.623.
    1. Wolfe JM, Van Wert MJ. Varying target prevalence reveals two dissociable decision criteria in visual search. Current Biology. 2010;20(2):121–124. doi: 10.1016/j.cub.2009.11.066.

Source: PubMed

3
Suscribir