Applying the Model-Comparison Approach to Test Specific Research Hypotheses in Psychophysical Research Using the Palamedes Toolbox

Nicolaas Prins, Frederick A A Kingdom, Nicolaas Prins, Frederick A A Kingdom

Abstract

In the social sciences it is common practice to test specific theoretically motivated research hypotheses using formal statistical procedures. Typically, students in these disciplines are trained in such methods starting at an early stage in their academic tenure. On the other hand, in psychophysical research, where parameter estimates are generally obtained using a maximum-likelihood (ML) criterion and data do not lend themselves well to the least-squares methods taught in introductory courses, it is relatively uncommon to see formal model comparisons performed. Rather, it is common practice to estimate the parameters of interest (e.g., detection thresholds) and their standard errors individually across the different experimental conditions and to 'eyeball' whether the observed pattern of parameter estimates supports or contradicts some proposed hypothesis. We believe that this is at least in part due to a lack of training in the proper methodology as well as a lack of available software to perform such model comparisons when ML estimators are used. We introduce here a relatively new toolbox of Matlab routines called Palamedes which allows users to perform sophisticated model comparisons. In Palamedes, we implement the model-comparison approach to hypothesis testing. This approach allows researchers considerable flexibility in targeting specific research hypotheses. We discuss in a non-technical manner how this method can be used to perform statistical model comparisons when ML estimators are used. With Palamedes we hope to make sophisticated statistical model comparisons available to researchers who may not have the statistical background or the programming skills to perform such model comparisons from scratch. Note that while Palamedes is specifically geared toward psychophysical data, the core ideas behind the model-comparison approach that our paper discusses generalize to any field in which statistical hypotheses are tested.

Keywords: model comparisons; psychometrics; psychophysics; software; statistics.

Figures

FIGURE 1
FIGURE 1
(A) Results of a hypothetical experiment in which observers are tested in a Vernier-alignment task. Plotted are the proportions of responding ‘left’ for each of the five Vernier alignments used. The observed proportions correct also define the saturated model which makes no assumptions as to how the probability of a correct response depends on experimental condition or stimulus intensity. (B) Four different models of the results shown in (A). The models differ with respect to their assumptions regarding two of the four parameters of a PF (location and slope). The text describes how to perform model comparisons between the models labeled here as ‘fuller,’ ‘lesser,’ and ‘saturated’ (the latter shown in A).
FIGURE 2
FIGURE 2
Schematic depiction of the model-comparison approach as applied to the research question described in the Section “A Simple One-Condition Example Demonstrating the Model-Comparison Approach.” Each circle represents a model of the data shown in Figure 1. Models differ with respect to the assumptions they make. The assumptions that each of the models make are listed in the circles that represent the models. The lines connecting pairs of models are labeled with the assumptions that differ between the models. Under the model-comparison approach, specific assumptions are tested by comparing a model that makes the assumption(s) to a model that does not make the assumption(s). For example, in order to test whether the location parameter of a PF equals zero (i.e., whether α = 0), we compare the top left (‘fuller’) model which does not make the assumption to the top right model which does make the assumption. Note that otherwise the two models make the same assumptions. Model comparisons may also be performed between models that differ with respect to multiple assumptions. For example, a Goodness-of-Fit test tests all of a model’s assumptions except the assumptions of independence and stability. The p-values resulting from the three model comparisons shown here are given in this figure.
FIGURE 3
FIGURE 3
(A) Results of a hypothetical experiment in which observers perform a Vernier-alignment task under two experimental conditions (solid versus open symbols). Under each condition, five stimulus intensities are used. Plotted are the proportions of responding ‘left’ for each of the 10 combinations of experimental condition and stimulus intensity. The proportions correct also define the saturated model which makes no assumptions as to how the probability of a correct response depends on experimental condition or stimulus intensity. (B) Nine different models of the results shown in (A). The models differ with respect to their assumptions regarding two of the four parameters of a PF (location and slope). The text describes model comparisons between the models labeled here as ‘fuller,’ ‘lesser,’ and ‘saturated’ (the latter shown in A).
FIGURE 4
FIGURE 4
Similar to Figure 2 but now applied to the two-condition experiment described in the Section “A Two-Condition Example.” Each circle represents a model of the data shown in Figure 3A. The fuller model does not assume that the slopes are equal, while the lesser model does make this assumption. Note that otherwise the models make the same assumptions.
FIGURE 5
FIGURE 5
The histogram displays an empirical sampling distribution for the transformed likelihood ratio (TLR) for our example model comparison between the fuller and lesser model described in the Section “A Two-Condition Example.” The distribution is based on 10,000 Monte Carlo simulations. The curve corresponds to the theoretical χ2 distribution with 1 degree of freedom1.

References

    1. Akaike H. (1974). A new look at the statistical model indemnification. IEEE Trans. Automat. Control 19 716–723. 10.1109/TAC.1974.1100705
    1. Baguley T. S. (2012). Serious Stats: A Guide to Advanced Statistics for the Behavioral Statistics. Houndmills: Palgrave MacMillan; 10.1007/978-0-230-36355-7
    1. Cohen J. (1994). The earth is round (p < 0.05). Am. Psychol. 49 997–1003. 10.1037/0003-066X.49.12.997
    1. Efron B., Tibshirani R. J. (1994). An Introduction to the Bootstrap. Boca Raton, FL: CRC Press.
    1. García-Pérez M. A. (1998). Forced-choice staircases with fixed step sizes: asymptotic and small-sample properties. Vision Res. 38 1861–1881. 10.1016/S0042-6989(97)00340-4
    1. Gelman A. (2013). P values and statistical practice. Epidemiology 24 69–72. 10.1097/EDE.0b013e31827886f7
    1. Hayes W. L. (1994). Statistics. Belmont CA: Wadsworth.
    1. Hoel P. G., Port S. C., Stone C. J. (1971). Introduction to Statistical Theory. Boston, MA: Houghton Mifflin Company.
    1. Ioannidis J. P. A. (2005). Why most published research findings are false. PLoS Med. 2:e124. 10.1371/journal.pmed.0020124
    1. Jaynes E. T., Bretthorst G. L. (2003). Probability Theory: The Logic of Science. Cambridge: Cambridge University Press; 10.1017/CBO9780511790423
    1. Judd C. M., McClelland G. H. (1989). Data Analysis: A Model Comparison Approach. San Diego, CA: Harcourt.
    1. Judd C. M., McClelland G. H., Ryan C. S. (2008). Data Analysis: A Model Comparison Approach, 2nd Edn. New York, NY: Routledge.
    1. Kingdom F. A. A., Baldwin A. S., Schmidtmann G. (2015). Modelling probability and additive summation for detection across multiple mechanisms under the assumptions of signal detection theory. J. Vis. 15:1. 10.1167/15.5.1
    1. Kingdom F. A. A., Prins N. (2016). Psychophysics: A Practical Introduction, 2nd Edn. Cambridge, MA: Academic Press.
    1. Kline R. B. (2004). Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research. (Washington, DC: APA Books; ). 10.1037/10693-000
    1. Knoblauch K. (2014). psyphy: Functions for Analyzing Psychophysical Data in R. Available at:
    1. Knoblauch K., Maloney L. T. (2012). Modeling Psychophysical Data in R. New York, NY: Springer; 10.1007/978-1-4614-4475-6
    1. Kruschke J. K. (2014). Doing Bayesian Data Analysis. Boston, MA: Academic Press.
    1. Kontsevich L. L., Tyler C. W. (1999). Bayesian adaptive estimation of psychometric slope and threshold. Vision Res. 39 2729–2737. 10.1016/S0042-6989(98)00285-5
    1. Linares D., López-Moliner J. (2016). quickpsy: an R package to fit psychometric functions for multiple groups. R J. 8 122–131.
    1. Maloney L. T., Yang J. N. (2003). Maximum likelihood difference scaling. J. Vis. 3 573–585. 10.1167/3.8.5
    1. Pentland A. (1980). Maximum likelihood estimation: the best PEST. Percept. Psychophys. 28 377–379. 10.3758/BF03204398
    1. Prins N. (2008a). Correspondence matching in long-range apparent motion precedes featural analysis. Perception 37 1022–1036.
    1. Prins N. (2008b). Texture modulation detection by probability summation among orientation-selective, and isotropic mechanisms. Vision Res. 48 2751–2766. 10.1016/j.visres.2008.09.005
    1. Prins N. (2012). The psychometric function: the lapse rate revisited. J. Vis. 12:25. 10.1167/12.6.25
    1. Prins N. (2013). The psi-marginal adaptive method: How to give nuisance parameters the attention they deserve (no more, no less). J. Vis. 13:3. 10.1167/13.7.3
    1. Rolfs M., Murray-Smith N., Carrasco M. (2018). Perceptual Learning while preparing saccades. Vision Res. (in press). 10.1016/j.visres.2017.11.009
    1. Schütt H. H., Harmeling S., Macke J. H., Wichmann F. A. (2016). Painfree and accurate Bayesian estimation of psychometric functions for (potentially) overdispersed data. Vision Res. 122 105–123. 10.1016/j.visres.2016.02.002
    1. Trafimow D., Marks M. (2015). Editorial. Basic Appl. Soc. Psychol. 37 1–2. 10.1080/01973533.2015.1012991
    1. van Driel J., Knapen T., van Es D. M., Cohen M. X. (2014). Interregional alpha-band synchrony supports temporal cross-modal integration. Neuroimage 101 404–415. 10.1016/j.neuroimage.2014.07.022
    1. Watson A. B., Pelli D. G. (1983). Quest: a bayesian adaptive psychometric method. Percept. Psychophys. 33 113–120. 10.3758/BF03202828

Source: PubMed

3
订阅