Gorilla in our midst: An online behavioral experiment builder

Alexander L Anwyl-Irvine, Jessica Massonnié, Adam Flitton, Natasha Kirkham, Jo K Evershed, Alexander L Anwyl-Irvine, Jessica Massonnié, Adam Flitton, Natasha Kirkham, Jo K Evershed

Abstract

Behavioral researchers are increasingly conducting their studies online, to gain access to large and diverse samples that would be difficult to get in a laboratory environment. However, there are technical access barriers to building experiments online, and web browsers can present problems for consistent timing-an important issue with reaction-time-sensitive measures. For example, to ensure accuracy and test-retest reliability in presentation and response recording, experimenters need a working knowledge of programming languages such as JavaScript. We review some of the previous and current tools for online behavioral research, as well as how well they address the issues of usability and timing. We then present the Gorilla Experiment Builder (gorilla.sc), a fully tooled experiment authoring and deployment platform, designed to resolve many timing issues and make reliable online experimentation open and accessible to a wider range of technical abilities. To demonstrate the platform's aptitude for accessible, reliable, and scalable research, we administered a task with a range of participant groups (primary school children and adults), settings (without supervision, at home, and under supervision, in both schools and public engagement events), equipment (participant's own computer, computer supplied by the researcher), and connection types (personal internet connection, mobile phone 3G/4G). We used a simplified flanker task taken from the attentional network task (Rueda, Posner, & Rothbart, 2004). We replicated the "conflict network" effect in all these populations, demonstrating the platform's capability to run reaction-time-sensitive experiments. Unresolved limitations of running experiments online are then discussed, along with potential solutions and some future features of the platform.

Keywords: Attentional control; Browser timing; Online methods; Online research; Remote testing; Timing accuracy.

Figures

Fig. 1
Fig. 1
Example of the two main GUI elements of Gorilla. (A) The Task Builder, with a screen selected showing how a trial is laid out. (B) The Experiment Builder, showing a check for the participant, followed by a randomizer node that allocates the participant to one of two conditions, before sending them to a Finish node
Fig. 2
Fig. 2
Trial types for Experiment 1: Different conditions used in the flanker task
Fig. 3
Fig. 3
Time course of a typical trial in Experiment 1. These screens represent what the participant was seeing within the web browser
Fig. 4
Fig. 4
Distribution of accuracy differences between congruent and incongruent trials, for each group in Experiment 1. Group A was children in school in Corsica, France; Group B consisted of children in schools in London, UK; and Group C consisted of children attending a university public engagement event in London
Fig. 5
Fig. 5
Distribution of RT differences between congruent and incongruent trials for each group in Experiment 1
Fig. 6
Fig. 6
Trial types for Experiment 2: Different conditions used in the flanker task
Fig. 7
Fig. 7
Time course of a typical trial in Experiment 2. These screens represent what the participant was seeing within the web browser

References

    1. Adjerid I, Kelley K. Big data in psychology: A framework for research advancement. American Psychologist. 2018;73:899–917. doi: 10.1037/amp0000190.
    1. Barnhoorn JS, Haasnoot E, Bocanegra BR, van Steenbergen H. QRTEngine: An easy solution for running online reaction time experiments using Qualtrics. Behavior Research Methods. 2015;47:918–929. doi: 10.3758/s13428-014-0530-7.
    1. Casler K, Bickel L, Hackett E. Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior. 2013;29:2156–2160. doi: 10.1016/j.chb.2013.05.009.
    1. Chen, S.-C., de Koning, B., & Zwaan, R. A. (2018). Does object size matter with regard to the mental simulation of object orientation? Open Science Framework. Retrieved from
    1. Crump MJC, McDonnell JV, Gureckis TM. Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PLoS ONE. 2013;8:e57410. doi: 10.1371/journal.pone.0057410.
    1. de Leeuw JR. jsPsych: A JavaScript library for creating behavioral experiments in a Web browser. Behavior Research Methods. 2015;47:1–12. doi: 10.3758/s13428-014-0458-y.
    1. de Leeuw JR, Motz BA. Psychophysics in a Web browser? Comparing response times collected with JavaScript and Psychophysics Toolbox in a visual search task. Behavior Research Methods. 2016;48:1–12. doi: 10.3758/s13428-015-0567-2.
    1. Fan J, McCandliss BD, Sommer T, Raz A, Posner MI. Testing the efficiency and independence of attentional networks. Journal of Cognitive Neuroscience. 2002;14:340–347. doi: 10.1162/089892902317361886.
    1. Faul F, Erdfelder E, Buchner A, Lang A-G. Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods. 2009;41:1149–1160. doi: 10.3758/BRM.41.4.1149.
    1. Ferdman S, Minkov E, Bekkerman R, Gefen D. Quantifying the web browser ecosystem. PLoS ONE. 2017;12:e0179281. doi: 10.1371/journal.pone.0179281.
    1. Garaizar Pablo, Reips Ulf-Dietrich. Best practices: Two Web-browser-based methods for stimulus presentation in behavioral experiments with high-resolution timing requirements. Behavior Research Methods. 2018;51(3):1441–1453. doi: 10.3758/s13428-018-1126-4.
    1. Garaizar P, Vadillo MA, López-de Ipiña D. 2012 9th International Conference on Remote Engineering and Virtual Instrumentation (REV) Piscataway, NJ: IEEE Press; 2012. Benefits and pitfalls of using HTML5 APIs for online experiments and simulations; pp. 1–7.
    1. Garaizar P, Vadillo MA, López-de Ipiña D. Presentation accuracy of the web revisited: Animation methods in the HTML5 era. PLoS ONE. 2014;9:e109812. doi: 10.1371/journal.pone.0109812.
    1. Hauser DJ, Schwarz N. Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behavior Research Methods. 2016;48:400–407. doi: 10.3758/s13428-015-0578-z.
    1. Henninger, F., Mertens, U. K., Shevchenko, Y., & Hilbig, B. E. (2017). lab.js: Browser-based behavioral research (Software). 10.5281/zenodo.597045
    1. Hentschke H, Stüttgen MC. Computation of measures of effect size for neuroscience data sets. European Journal of Neuroscience. 2011;34:1887–1894. doi: 10.1111/j.1460-9568.2011.07902.x.
    1. Ipeirotis PG, Paritosh PK. Proceedings of the 20th international conference companion on World Wide Web. New York, NY: ACM Press; 2011. Managing crowdsourced human computation: A tutorial; pp. 287–288.
    1. Jacques JT, Kristensson PO. Proceedings of the First ACM Workshop on Mobile Crowdsensing Systems and Applications. New York, NY: ACM Press; 2017. Design strategies for efficient access to mobile device users via Amazon Mechanical Turk; pp. 25–30.
    1. Jasmin, K., Dick, F., Holt, L., & Tierney, A. T. (2018). Degeneracy makes music and speech robust to individual differences in perception. bioRxiv preprint. 10.1101/263079
    1. Jia R, Guo H, Wang Y, Zhang J. 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA) Piscataway, NJ: IEEE Press; 2018. Analysis and test of sound delay on Web Audio under different situations; pp. 1515–1519.
    1. Jones, A. L. (2018). Beyond average: Using face regression to study social perception. OSF. Retrieved from
    1. Kocher, P., Genkin, D., Gruss, D., Haas, W., Hamburg, M., Lipp, M., . . . Yarom, Y. (2018). Spectre attacks: Exploiting speculative execution. arXiv preprint. arXiv:1801.01203.
    1. Koivisto M, Grassini S. Neural processing around 200 ms after stimulus-onset correlates with subjective visual awareness. Neuropsychologia. 2016;84:235–243. doi: 10.1016/j.neuropsychologia.2016.02.024.
    1. Lakens D. Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t tests and ANOVAs. Frontiers in Psychology. 2013;4:863. doi: 10.3389/fpsyg.2013.00863.
    1. Lange K, Kühn S, Filevich E. “Just Another Tool for Online Studies” (JATOS): An easy solution for setup and management of web servers supporting online studies. PLoS ONE. 2015;10:e0130834. doi: 10.1371/journal.pone.0130834.
    1. Lavan, N., Knight, S., & McGettigan, C. (2018). Listeners form average-based representations of individual voice identities—even when they have never heard the average. PsyArXiv preprint. 10.31234/
    1. Lumsden J, Skinner A, Coyle D, Lawrence N, Munafò M. Attrition from web-based cognitive testing: A repeated measures comparison of gamification techniques. Journal of Medical Internet Research. 2017;19:e395. doi: 10.2196/jmir.8473.
    1. MacLeod JW, Lawrence MA, McConnell MM, Eskes GA, Klein RM, Shore DI. Appraising the ANT: Psychometric and theoretical considerations of the Attention Network Test. Neuropsychology. 2010;24:637–651. doi: 10.1037/a0019803.
    1. Miller R, Schmidt K, Kirschbaum C, Enge S. Comparability, stability, and reliability of internet-based mental chronometry in domestic and laboratory settings. Behavior Research Methods. 2018;50:1345–1358. doi: 10.3758/s13428-018-1036-5.
    1. Mozilla. (2019). Performance.now(). Retrieved January 17, 2019, from
    1. Nakibly, G., Shelef, G., & Yudilevich, S. (2015). Hardware fingerprinting using HTML5. arXiv preprint. arXiv:1503.01408
    1. Palan S, Schitter C. —A subject pool for online experiments. Journal of Behavioral and Experimental Finance. 2018;17:22–27. doi: 10.1016/j.jbef.2017.12.004.
    1. Papoutsaki A, Sangkloy P, Laskey J, Daskalova N, Huang J, Hays J. Proceedings of the Twenty Fifth International Joint Conference on Artificial Intelligence—IJCAI 2016. Arlington, VA: National Science Foundation; 2016. WebGazer: Scalable webcam eye tracking using user interactions; pp. 3839–3845.
    1. Peirce JW, MacAskill MR. Building experiments in PsychoPy. London, UK: Sage; 2018.
    1. Pollock L. Statistical and methodological problems with concreteness and other semantic variables: A list memory experiment case study. Behavior Research Methods. 2018;50:1198–1216. doi: 10.3758/s13428-017-0938-y.
    1. Poort, E. D., & Rodd, J. M. (2017). Studies of cross-lingual long-term priming. PsyArXiv preprint. 10.31234/
    1. Reimers Stian, Stewart Neil. Presentation and response timing accuracy in Adobe Flash and HTML5/JavaScript Web experiments. Behavior Research Methods. 2014;47(2):309–327. doi: 10.3758/s13428-014-0471-1.
    1. Reimers Stian, Stewart Neil. Auditory presentation and synchronization in Adobe Flash and HTML5/JavaScript Web experiments. Behavior Research Methods. 2016;48(3):897–908. doi: 10.3758/s13428-016-0758-5.
    1. Richards G, Lebresne S, Burg B, Vitek J. Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, NY, USA: ACM Press; 2010. An analysis of the dynamic behavior of JavaScript programs; pp. 1–12.
    1. Richardson, D. C., Griffin, N. K., Zaki, L., Stephenson, A., Yan, J., Curry, T., . . . Devlin, J. T. (2018). Measuring narrative engagement: The heart tells the story. bioRxiv preprint. 10.1101/351148
    1. Ritter, T., & Mozilla. (2018). Bug 1440863, comment 13 (Bug report). Retrieved January 17, 2019, from
    1. Ross J, Irani L, Silberman M, Zaldivar A, Tomlinson B. CHI’10 extended abstracts on human factors in computing systems. New York, NY: ACM Press; 2010. Who are the crowdworkers? Shifting demographics in Mechanical Turk; pp. 2863–2872.
    1. Rueda MR, Posner MI, Rothbart MK. Handbook of self-regulation: Research, theory, and applications. New York, NY: Guilford Press; 2004. Attentional control and self-regulation; pp. 283–300.
    1. Rutiku R, Aru J, Bachmann T. General markers of conscious visual perception and their timing. Frontiers in Human Neuroscience. 2016;10:23. doi: 10.3389/fnhum.2016.00023.
    1. Saito T, Yasuda K, Ishikawa T, Hosoi R, Takahashi K, Chen Y, Zalasiński M. 2016 10th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS) Piscataway, NJ: IEEE Press; 2016. Estimating CPU features by browser fingerprinting; pp. 587–592.
    1. Schmidt WC. The server side of psychology Web experiments. In: Birnbaum MH, editor. Psychological experiments on the Internet. New York, NY: Academic Press; 2000. pp. 285–310.
    1. Schmidt WC. Presentation accuracy of Web animation methods. Behavior Research Methods, Instruments, & Computers. 2001;33:187–200. doi: 10.3758/BF03195365.
    1. Schwarz M, Maurice C, Gruss D, Mangard S. International Conference on Financial Cryptography and Data Security. Cham, Switzerland: Springer; 2017. Fantastic timers and where to find them: High-resolution microarchitectural attacks in JavaScript; pp. 247–267.
    1. Semmelmann K, Weigelt S. Online webcam-based eye tracking in cognitive science: A first look. Behavior Research Methods. 2018;50:451–465. doi: 10.3758/s13428-017-0913-7.
    1. Severance C. JavaScript: Designing a language in 10 days. Computer. 2012;45:7–8. doi: 10.1109/MC.2012.57.
    1. Stoet G. PsyToolkit: A novel Web-based method for running online questionnaires and reaction-time experiments. Teaching of Psychology. 2017;44:24–31. doi: 10.1177/0098628316677643.
    1. Turner M, Budgen D, Brereton P. Turning software into a service. Computer. 2003;36:38–44. doi: 10.1109/MC.2003.1236470.
    1. Usher-Smith JA, Masson G, Mills K, Sharp SJ, Sutton S, Klein WMP, Griffin SJ. A randomised controlled trial of the effect of providing online risk information and lifestyle advice for the most common preventable cancers: Study protocol. BMC Public Health. 2018;18:796. doi: 10.1186/s12889-018-5712-2.
    1. Whelan R. Effective analysis of reaction time data. Psychological Record. 2008;58:475–482. doi: 10.1007/BF03395630.
    1. Woods AT, Velasco C, Levitan CA, Wan X, Spence C. Conducting perception research over the internet: A tutorial review. PeerJ. 2015;3:e1058. doi: 10.7717/peerj.1058.
    1. World Medical Association World Medical Association Declaration of Helsinki: Ethical principles for medical research involving human subjects. JAMA. 2013;310:2191–2194. doi: 10.1001/jama.2013.281053.
    1. World Wide Web Consortium. (2019). Standards, HTML current status (Webpage). Retrieved March 22, 2019, from
    1. Yung A, Cardoso-Leite P, Dale G, Bavelier D, Green CS. Methods to test visual attention online. Journal of Visualized Experiments. 2015;96:e52470. doi: 10.3791/52470.
    1. Zaytsev, J. (2019). ECMAScript compatibility tables (GitHub repository). Retrieved January 8, 2019, from
    1. Zloteanu, M., Harvey, N., Tuckett, D., & Livan, G. (2018). Digital identity: The effect of trust and reputation information on user judgement in the sharing economy. PloS one, 13(12), e0209071.
    1. Zotos E, Herpers R. 2012 International Conference on Cyberworlds (CW) New York, NY: IEEE Press; 2012. Interactive distributed rendering of 3D scenes on multiple Xbox 360 systems and personal computers; pp. 114–121.
    1. Zotos E, Herpers R. Distributed rendering for interactive multi-screen visualization environments based on XNA Game Studio. In: Gavrilova ML, Tan CJK, Kuijper A, editors. Transactions in computational science XVIII (Lecture Notes in Computer Science) Berlin, Germany: Springer; 2013. pp. 1–20.

Source: PubMed

3
Sottoscrivi