A new model of decision processing in instrumental learning tasks

Steven Miletić, Russell J Boag, Anne C Trutti, Niek Stevenson, Birte U Forstmann, Andrew Heathcote, Steven Miletić, Russell J Boag, Anne C Trutti, Niek Stevenson, Birte U Forstmann, Andrew Heathcote

Abstract

Learning and decision-making are interactive processes, yet cognitive modeling of error-driven learning and decision-making have largely evolved separately. Recently, evidence accumulation models (EAMs) of decision-making and reinforcement learning (RL) models of error-driven learning have been combined into joint RL-EAMs that can in principle address these interactions. However, we show that the most commonly used combination, based on the diffusion decision model (DDM) for binary choice, consistently fails to capture crucial aspects of response times observed during reinforcement learning. We propose a new RL-EAM based on an advantage racing diffusion (ARD) framework for choices among two or more options that not only addresses this problem but captures stimulus difficulty, speed-accuracy trade-off, and stimulus-response-mapping reversal effects. The RL-ARD avoids fundamental limitations imposed by the DDM on addressing effects of absolute values of choices, as well as extensions beyond binary choice, and provides a computationally tractable basis for wider applications.

Keywords: computational modelling; evidence accumulation; human; neuroscience; reinforcement learning; value-based decision making.

Conflict of interest statement

SM, RB, AT, NS, AH No competing interests declared, BF Reviewing editor, eLife

Figures

**Figure 1.. Comparison of the decision-making models.**
Bottom graphs visualize how Q-values are linked to accumulation rates. Top panel illustrates the evidence-accumulation process of the DDM (panel A) and racing diffusion (RD) models (panels B and C). Note that in the race models there is no lower bound. Equations 2–4 formally link Q-values to evidence-accumulation rates. In the RL-DDM, the difference (Δ) in Q-values is accumulated, weighted by free parameter w, plus additive within-trial white noise W with standard deviation s. In the RL-RD, the (weighted) Q-values for both choice options are independently accumulated. An evidence-independent baseline urgency term, V0 (equal for all accumulators), further drives evidence accumulation. In the RL-ARD models, the advantages (Δ) in Q-values are accumulated as well, plus the evidence-independent baseline term V0. The gray icons indicate the influence of the Q-value *sum*Σ on evidence accumulation, which is not included in the limited variant of the RL-ARD. In all panels, bold-italic faced characters indicate parameters. Q1 and Q2 are Q-values for both choice options, which are updated according to a delta learning rule (Equation 1 at the bottom of the graph), with learning rate α.

**Figure 2.. Paradigms for experiments 1–3.**
(A) Example trial for experiments 1 and 3. Each trial starts with a fixation cross, followed by the presentation of the stimulus (until choice is made or 2.5 s elapses), a brief highlight of the chosen option, and probabilistic feedback. Reward probabilities are summarized in (B). Percentages indicate the probabilities of receiving +100 points for a choice (with 0 otherwise). The actual symbols used differed between experiments and participants. In experiment 3, the acquisition phase lasted 61–68 trials (uniformly sampled each block), after which the reward contingencies for each stimulus set reversed. (C) Example trial for experiment 2, which added a cue prior to each trial (‘SPD’ or ‘ACC’), and had feedback contingent on both the choice and choice timing. In the SPD condition, RTs under 600 ms were considered in time, and too slow otherwise. In the ACC condition, choices were in time as long as they were made in the stimulus window of 1.5 s. Positive feedback ‘Outcome: +100’ and ‘Reward: +100’ were shown in green letters, negative feedback (‘Outcome: 0’, ‘Reward: 0’, and ‘Too slow!”) were shown in red letters.

Figure 3.. Comparison of posterior predictive distributions… — **Figure 3.. Comparison of posterior predictive distributions of the four RL-EAMs.**
Data (black) and posterior predictive distribution (blue) of the RL-DDM (left column), RL-RD, RL-lARD, and RL-ARD (right column). Top row depicts accuracy over trial bins. Middle and bottom row show 10th, 50th, and 90th RT percentiles for the correct (middle row) and error (bottom row) response over trial bins. Shaded areas correspond to the 95% credible interval of the posterior predictive distributions. All data are collapsed across participants and difficulty conditions.

Figure 3—figure supplement 1.. Comparison of posterior… — **Figure 3—figure supplement 1.. Comparison of posterior predictive distributions of four additional RL-DDMs.**
Data are black dots and lines, posterior predictive distribution are blue. Top row depicts accuracy over trial bins. Middle and bottom row illustrate 10th, 50th, and 90th quantile RT for the correct (middle row) and error (bottom row) response over trial bins. Shaded areas correspond to the 95% credible interval of the posterior predictive distributions. All data are collapsed across participants and difficulty conditions. The summed BPICs were 7717 (RL-DDM A1), 7636 (RL-DDM A2), 4844 (RL-DDM A3) and 4884 (RL-DDM A4). Hence, the largest improvement of quality of fit of the RL-DDM was obtained by adding st0.

Figure 3—figure supplement 2.. Parameter recovery of… — **Figure 3—figure supplement 2.. Parameter recovery of the RL-ARD model, using the experimental paradigm of experiment 1.**
Parameter recovery was done by first fitting the RL-ARD model to the empirical data, and then simulating the exact same experimental paradigm (208 trials, 55 subjects, four difficulty conditions) using the median parameter estimates obtained from the model fit. Subsequently, the RL-ARD was fit to the simulated data. The recovered median posterior estimates (y-axis) are plotted against the data-generating values (x-axis). Pearson’s correlation coefficient r and the root mean square error (RMSE) are shown in each panel. Diagonal lines indicate the identity x = y.

Figure 3—figure supplement 3.. Confusion matrices showing… — **Figure 3—figure supplement 3.. Confusion matrices showing model separability.**
Here, we first fit the RLDDM and RL-ARD to the empirical data of experiment 1, and then simulating 50 full datasets with both models using the exact same experimental paradigm (208 trials, 55 subjects, 4 difficulty conditions), and using the median parameter estimates obtained from the model fits. Hence, in total 100 full datasets (5500 subjects) were simulated. We then fit both the RL-DDM and RL-ARD to all 100 simulated datasets. The matrices visualize the model confusability when using the BPIC (left column) or minimum deviance (right column) as a model comparison metric. The minimum deviance is a measure of quality of fit without a penalty for model complexity. The top row shows the model comparisons per dataset (using the summed BPIC / minimum deviances), bottom row shows the model comparisons per subject. Model comparisons using the BPIC perfectly identified the data-generating model when summing BPICs across subjects. By subject individually, the RL-ARD incorrectly won model comparisons for 82 subjects (3%); the RL-DDM incorrectly won for 4 subjects (0.1%). Interestingly, when using the summed minimum deviances as a model comparison metric (thus not penalizing for model complexity), the across-subject model comparisons perfectly identified the data-generating model. By subject individually, the RL-ARD incorrectly won comparisons for 333 subjects (12%). Combined, this indicates that the RL-ARD, while more complex, is generally not sufficiently flexible to outperform RL-DDM in terms of the quality of fit on data generated by the RL-DDM.

Figure 3—figure supplement 4.. Empirical (black) and… — **Figure 3—figure supplement 4.. Empirical (black) and posterior predictive (blue) defective probability densities of the RT distributions estimated using kernel density approximation.**
The error RT distributions are shown as negative RTs for visualization. Blue lines represent 100 posterior predictive RT distributions from the RL-ARD model. The grand average is the RT distribution across all trials and subjects, subject-wise RT distributions are across all trials per subject for the first ten subjects, for which the quality of fit was representative for the entire dataset.

Figure 4.. Data (black) and posterior predictive… — **Figure 4.. Data (black) and posterior predictive distribution of the RL-ARD (blue), separately for each difficulty condition.**
Column titles indicate the reward probabilities, with 0.6/0.4 being the most difficult, and 0.8/0.2 the easiest condition. Top row depicts accuracy over trial bins. Middle and bottom rows show 10th, 50th, and 90th RT percentiles for the correct (middle row) and error (bottom row) response over trial bins. Shaded areas correspond to the 95% credible interval of the posterior predictive distributions. All data and fits are collapsed across participants.

Figure 4—figure supplement 1.. Data (black) and… — **Figure 4—figure supplement 1.. Data (black) and posterior predictive distribution of the RL-DDM (blue), separately for each difficulty condition.**
Row titles indicate the reward probabilities, with 0.6/0.4 being the most difficult, and 0.8/0.2 the easiest condition. Top row depicts accuracy over trial bins. Middle and bottom row illustrate 10th, 50th, and 90th quantile RT for the correct (middle row) and error (bottom row) response over trial bins. Shaded areas correspond to the 95% credible interval of the posterior predictive distributions. All data are collapsed across participants.

Figure 4—figure supplement 2.. Data (black) and… — Figure 4—figure supplement 2.. Data (black) and posterior predictive distribution of the RL-ARD (blue), separately for each difficulty condition, excluding 17 subjects which had perfect accuracy in the first bin of the easiest condition.
Row titles indicate the reward probabilities, with 0.6/0.4 being the most difficult, and 0.8/0.2 the easiest condition. Top row depicts accuracy over trial bins. Middle and bottom row illustrate 10th, 50th, and 90th quantile RT for the correct (middle row) and error (bottom row) response over trial bins. Shaded areas correspond to the 95% credible interval of the posterior predictive distributions. All data are collapsed across participants.

Figure 4—figure supplement 3.. Posterior predictive distribution… — **Figure 4—figure supplement 3.. Posterior predictive distribution of the RL-ALBA model on the data of experiment 1, with one column per difficulty condition.**
The LBA assumes that, on every trial, two accumulators race deterministically toward a common bound b. Each accumulator i starts at a start point sampled from a uniform distribution [0, A], and with a speed of evidence accumulation sampled from a normal distribution 𝒩vi,si. In the RL-ALBA model, we used Equation 4 to link Q-values to LBA drift rates v1 and v2 (excluding the sW term, since the LBA assumes no within-trial noise). Instead of directly estimating threshold b, we estimated the difference B = *b-A* (which simplifies enforcing b>A). We used the following mildly informed priors for the hypermeans: V0~𝒩2,5, wd~𝒩9,5 truncated at lower bound 0, ws~𝒩0,3, s2~𝒩1,1, A~𝒩1,1, B~𝒩3,5 truncated at lower bound 0, and t0~𝒩0.3,0.5, truncated at lower bound 0.025 and upper bound 1. For the hyperSDs, all priors were Γ(1,1). The summed BPIC was 4836, indicating that the RL-ALBA performs slightly better than the RL-DDM with between-trial variabilities (BPIC = 4844), and better than the RL-lARD (BPIC = 4849), but not as well as the RL-ARD (BPIC = 4577).

Figure 4—figure supplement 4.. Data (black) and… — **Figure 4—figure supplement 4.. Data (black) and posterior predictive distribution of the RL-ARD (blue), separately for each difficulty condition.**
Column titles indicate the reward probabilities, with 0.6/0.4 being the most difficult, and 0.8/0.2 the easiest condition. Top row depicts accuracy over trial bins. Middle and bottom rows show 10th, 50th, and 90th RT percentiles for the correct (middle row) and error (bottom row) response over trial bins. Shaded areas correspond to the 95% credible interval of the posterior predictive distributions. All data and fits are collapsed across participants. Error bars depict standard errors.

Figure 5.. The evolution of Q-values and… — **Figure 5.. The evolution of Q-values and their effect on drift rates in the RL-ARD.**
A depicts raw Q-values, separate for each difficulty condition (colors). B and C depict the Q-value differences and the Q-value sums over time. The drift rates (D) are a weighted sum of the Q-value differences and Q-value sums, plus an intercept.

Figure 6.. Data (black) and posterior predictive… — Figure 6.. Data (black) and posterior predictive distributions (blue) of the best-fitting RL-DDM (left columns) and the winning RL-ARD model (right columns), separate for the speed and accuracy emphasis conditions.
Top row depicts accuracy over trial bins. Middle and bottom row show 10th, 50th, and 90th RT percentiles for the correct (middle row) and error (bottom row) response over trial bins. Shaded areas in the middle and right column correspond to the 95% credible interval of the posterior predictive distribution.

Figure 6—figure supplement 1.. Data (black) of… — Figure 6—figure supplement 1.. Data (black) of experiment 2 and posterior predictive distribution (blue) of the RL-DDM A3 with separate thresholds for the SAT conditions, and between-trial variabilities in drift rates, start points, and non-decision times.
The corresponding summed BPIC was -861, an improvement over the RL-DDM, but outperformed by the RL-ARD (ΔBPIC=232 in favor of the RL-ARD). Top row depicts accuracy over trial bins. Middle and bottom row illustrate 10th, 50th, and 90th quantile RT for the correct (middle row) and error (bottom row) response over trial bins. Left and right column are speed and accuracy emphasis condition, respectively. Shaded areas correspond to the 95% credible interval of the posterior predictive distributions.

Figure 6—figure supplement 2.. Parameter recovery of… — **Figure 6—figure supplement 2.. Parameter recovery of the RL-ARD model, using the experimental paradigm of experiment 2.**
Parameter recovery was done by first fitting the RL-ARD model to the empirical data, and then simulating the exact same experimental paradigm (19 subjects, three difficulty conditions, 2 SAT conditions, 312 trials) using the median parameter estimates obtained from the model fit. Subsequently, the RL-ARD was fit to the simulated data. The median posterior estimates (y-axis) are plotted against the data-generating values (x-axis). Pearson’s correlation coefficient r and the root mean square error (RMSE) are shown in each panel. Diagonal lines indicate the identity x = y.

Figure 6—figure supplement 3.. Mean RT (left… — **Figure 6—figure supplement 3.. Mean RT (left column) and choice accuracy (right column) across trial bins (x-axis) for experiments 2 and 3 (rows).**
Block numbers are color-coded. Error bars are 1 SE. Mixed effects models indicated that in experiment 2, RTs decreased with block number (b = −0.04, SE = 6.15*10−3, 95% CI [−0.05,–0.03], p = 6.61*10−10) as well as with trial bin (b = −0.02, SE = 2.11*10−3, 95% CI [−0.02,–0.01], p = 1.68*10−13), and there was an interaction between trial bin and block number (b = 3.61*10−3, SE = 9.86*10−4, 95% CI [0.00, 0.01], p = 2.52*10−4). There was a main effect of (log-transformed) trial bin on accuracy (on a logit scale; b = 0.36, SE = 0.11, 95% CI [0.15, 0.57], p = 7.99*10−4), but no effect of block number, nor an interaction between block number and trial bin on accuracy. In experiment 3, response times increased with block number (b = 0.02, SE = 3.10*10−3, 95% CI [0.01, 0.02], p = 1.21*10−7), decreased with trial bin (b = −4.24*10−3, SE = 1.37*10−3, 95% CI [−6.92*10−3, −1.56*10−3], p = 0.002), but there was no interaction between trial bin and block number (b = −9.15*10−4, SE = 5*10−4, 95% CI [0.00, 0.00], p = 0.067). The bottom left panel suggests that the main effect of block number on RT is largely caused by an increase in RT after the first block. Accuracy decreased with (log-transformed) trial bin (on a logit scale: b = −0.12, SE = 0.05, 95% CI [−0.22,–0.02], p = 0.02), decreased with block number (b = −0.08, SE = 0.03, 95% CI [−0.14,–0.02], p = 0.009), but there was no interaction (b = 0.02, SE = 0.02, 95% CI [−0.02, 0.06], p = 0.276). The decrease in accuracy with trial bin is expected due to the presence of reversals. The combination of an increase in RT and a decrease in accuracy after the first block could indicate that participants learnt the structure of the task (i.e. the presence of reversals) in the first block, and adjusted their behavior accordingly. In line with this speculation, the accuracy in trial bin 6 (in which the reversal occurred) was lowest in the first block, which suggests that participants adjusted to the reversal faster in the later blocks. In experiment 4, response times decreased with block number (b = −0.04, SE = 9.08*10−3, 95% CI [−0.06,–0.02], p = 3.19*10−3) and there was an interaction between block number and trial bin (b = −4.31*10−3, SE = 1.45*10−3, 95% CI = [−0.01, 0.00], p = 0.003), indicating that the decrease of RTs over trial bins was larger for the later blocks. There was no main effect of trial bin on RTs. There was a main effect of (log-transformed) trial bin on accuracy (on a logit scale: b = 0.60, SE = 0.07, 95% CI [0.47, 0.73], p < 10-16), but no main effect of block and no interaction between block and trial bin.

Figure 6—figure supplement 4.. Empirical (black) and… — Figure 6—figure supplement 4.. Empirical (black) and posterior predictive (blue) defective probability densities of the RT distributions of experiment 2, estimated using kernel density approximation.
Negative RTs correspond to error RTs. Blue lines represent 100 posterior predictive RT distributions from the RL-ARD model. The grand average is the RT distribution across all trials and subjects, subject-wise RT distributions are across all trials per subject for the first 10 subjects, for which the quality of fit was representative for the entire dataset.

Figure 6—figure supplement 5.. Data (black) and… — Figure 6—figure supplement 5.. Data (black) and posterior predictive distributions (blue) of the best-fitting RL-DDM (left columns) and the winning RL-ARD model (right columns), separate for the speed and accuracy emphasis conditions.
Top row depicts accuracy over trial bins. Middle and bottom row show 10th, 50th, and 90th RT percentiles for the correct (middle row) and error (bottom row) response over trial bins. Shaded areas in the middle and right column correspond to the 95% credible interval of the posterior predictive distribution. Error bars depict standard errors.

Figure 7.. Experiment three data (black) and… — **Figure 7.. Experiment three data (black) and posterior predictive distributions (blue) for the RL-DDM (left) and RL-ARD (right).**
Top row: choice proportions over trials, with choice option A defined as the high-probability choice before the reversal in reward contingencies. Bottom row: 10th, 50th, and 90th RT percentiles. The data are ordered relative to the trial at which the reversal first occurred (trial 0, with negative trial numbers indicated trials prior to the reversal). Shaded areas correspond to the 95% credible interval of the posterior predictive distributions.

Figure 7—figure supplement 1.. Data (black) of… — **Figure 7—figure supplement 1.. Data (black) of experiment 3 and posterior predictive of a standard soft-max learning model (blue).**
As priors, we used β~N(1,5) truncated at 0 for the hypermean and Γ(1,1) for the hyperSD. Left panel depicts choice proportions for option over trial bins, where choice A is defined as the high-probability reward choice prior to the reversal. Right column depicts choice proportion over trials, aligned to the trial at which the reversal occurred (trial 0). Shaded areas correspond to the 95% credible interval of the posterior predictive distributions.

Figure 7—figure supplement 2.. Data (black) of… — Figure 7—figure supplement 2.. Data (black) of experiment 3 and posterior predictive distribution (blue) of the RL-DDM A3 (with between-trial variabilities in drift rates, start points, and non-decision times).
The summed BPIC was 11659. This is better compared to the RL-DDM (ΔBPIC=3940) but did not outperform the RL-ARD (ΔBPIC=112 in favor of the RL-ARD).

Figure 7—figure supplement 3.. Parameter recovery of… — **Figure 7—figure supplement 3.. Parameter recovery of the RL-ARD model, using the experimental paradigm of experiment 3.**
Parameter recovery was done by first fitting the RL-ARD model to the empirical data, and then simulating the exact same experimental paradigm (49 subjects, 2 difficulty conditions, 512 trials, including reversals) using the median parameter estimates obtained from the model fit. Subsequently, the RL-ARD was fit to the simulated data. The median posterior estimates (y-axis) are plotted against the data-generating values (x-axis). Pearson’s correlation coefficient r and the root mean square error (RMSE) are shown in each panel. Diagonal lines indicate the identity x = y.

Figure 7—figure supplement 4.. Empirical (black) and… — Figure 7—figure supplement 4.. Empirical (black) and posterior predictive (blue) defective probability densities of the RT distributions of experiment 3, estimated using kernel density approximation.
Negative RTs correspond to choices for the option that was correct after the reversal. Blue lines represent 100 posterior predictive RT distributions from the RL-ARD model. The grand average is the RT distribution across all trials and subjects, subject-wise RT distributions are across all trials per subject for the first 10 subjects, for which the quality of fit was representative for the entire dataset.

Figure 7—figure supplement 5.. Experiment three data… — **Figure 7—figure supplement 5.. Experiment three data (black) and posterior predictive distributions (blue) for the RL-DDM (left) and RL-ARD (right).**
Top row: choice proportions over trials, with choice option A defined as the high-probability choice before the reversal in reward contingencies. Bottom row: 10th, 50th, and 90th RT percentiles. The data are ordered relative to the trial at which the reversal first occurred (trial 0, with negative trial numbers indicated trials prior to the reversal). Shaded areas correspond to the 95% credible interval of the posterior predictive distributions. Error bars depict standard errors.

Figure 8.. The evolution of Q-values and… — **Figure 8.. The evolution of Q-values and their effect on drift rates in the RL-ARD in experiment 3, aggregated across participants.**
Left panel depicts raw Q-values, separate for each difficulty condition (colors). The second and third panel depict the Q-value differences and the Q-value sums over time. The drift rates (right panel) are a weighted sum of the Q-value differences and Q-value sums, plus an intercept. Choice A (solid lines) refers to the option that had the high probability of reward during the acquisition phase, and choice B (dashed lines) to the option that had the high probability of reward after the reversal.

**Figure 9.. Architecture of the three-alternative RL-ARD.**
In three-choice settings, there are three Q-values. The multi-alternative RL-ARD has one accumulator per directional pairwise difference, hence there are six accumulators. The bottom graph visualizes the connections between Q-values and drift rates (V0 is left out to improve readability). The equations formalize the within-trial dynamics of each accumulator. Top panels illustrate one example trial, in which both accumulators corresponding to response option 1 reached their thresholds. In this example trial, the model chose option 1, with the RT determined by the slowest of the winning accumulators (here, the leftmost accumulator). Decision-related parameters V0,wd,ws,a,t0 are all identical across the six accumulators.

**Figure 10.. Experimental paradigm of experiment 4.**
(A) Example trial of experiment 4. Each trial started with a fixation cross, followed by the stimulus (three choice options; until the subject made a choice, up to 3 s), a brief highlight of the choice, and the choice’s reward was shown. (B) Reward contingencies for the target stimulus and two distractors per condition. Percentages indicate the probability of receiving +100 points (+0 otherwise). Presented symbols are examples, the actual symbols differed per block and participant (counterbalanced to prevent potential item effects from confounding the learning process).

Figure 11.. Data (black) and posterior predictive… — **Figure 11.. Data (black) and posterior predictive distribution of the RL-ARD (blue), separately for each difficulty condition.**
Column titles indicate the magnitude and difficulty condition. Top row depicts accuracy over trial bins. Middle and bottom rows show 10th, 50th, and 90th RT percentiles for the correct (middle row) and error (bottom row) response over trial bins. The error responses are collapsed across distractors. Shaded areas correspond to the 95% credible interval of the posterior predictive distributions. All data and fits are collapsed across participants.

Figure 11—figure supplement 1.. Parameter recovery of… — **Figure 11—figure supplement 1.. Parameter recovery of the multi-alterative Win-All RL-ARD model, using the experimental paradigm of experiment 4.**
Parameter recovery was done by first fitting the RL-ARD model to the empirical data, and then simulating the exact same experimental paradigm (34 subjects, 4 conditions, 432 trials) using the median parameter estimates obtained from the model fit. Subsequently, the RL-ARD was fit to the simulated data. The median posterior estimates (y-axis) are plotted against the data-generating values (x-axis). Pearson’s correlation coefficient r and the root mean square error (RMSE) are shown in each panel. Diagonal lines indicate the identity x = y.

Figure 11—figure supplement 2.. Empirical (black) and… — Figure 11—figure supplement 2.. Empirical (black) and posterior predictive (blue) defective probability densities of the RT distributions of experiment 4, estimated using kernel density approximation.
Negative RTs correspond to error choices, collapsed across the two distractor choice options. Blue lines represent 100 posterior predictive RT distributions from the RL-ARD model. The grand average is the RT distribution across all trials and subjects, subject-wise RT distributions are across all trials per subject for the first 10 subjects, for which the quality of fit was representative for the entire dataset.

Figure 11—figure supplement 3.. Data (black) and… — **Figure 11—figure supplement 3.. Data (black) and posterior predictive distribution of the RL-ARD (blue), separately for each difficulty condition of experiment 4.**
Column titles indicate the magnitude and difficulty condition. Top row depicts accuracy over trial bins. Middle and bottom rows show 10th, 50th, and 90th RT percentiles for the correct (middle row) and error (bottom row) response over trial bins. The error responses are collapsed across distractors. Shaded areas correspond to the 95% credible interval of the posterior predictive distributions. All data and fits are collapsed across participants. Error bars depict standard errors.

**Figure 12.. Q-value evolution in experiment 4.**
Top row corresponds to the low magnitude condition, bottom to the high magnitude condition. Colors indicate choice difficulty. (A) Q-values for target (QT) and distractor stimuli (QD). (B) Difference in Q-values, for target – distractor (ΔQT-D) and between the two distractors (ΔQD-D). The Q-value difference ΔQD-T is omitted from the graph to aid readability (but ΔQD-T=-ΔQT-D). (C) Sum of Q-values. (D) Resulting drift rates for target response accumulators (vT-D), and accumulators for the distractor choice options (vD-T,vD-D). Note that within each condition, there is a single Q-value trace per choice option, but since there are two distractors, there are two overlapping traces for ΔQT-D, ΣQT+D, and for all drift rates.

References

1. Anders R, Alario FX, Van Maanen L. The shifted wald distribution for response time data analysis. Psychological Methods. 2016;21:309–327. doi: 10.1037/met0000066.
1. Ando T. Bayesian predictive information criterion for the evaluation of hierarchical bayesian and empirical bayes models. Biometrika. 2007;94:443–458. doi: 10.1093/biomet/asm017.
1. Arnold NR, Bröder A, Bayen UJ. Empirical validation of the diffusion model for recognition memory and a comparison of parameter-estimation methods. Psychological Research. 2015;79:882–898. doi: 10.1007/s00426-014-0608-y.
1. Barto AG, Sutton RS, Brouwer PS. Associative search network: a reinforcement learning associative memory. Biological Cybernetics. 1981;40:201–211. doi: 10.1007/BF00453370.
1. Bates D, Mächler M, Bolker B, Walker S. Fitting linear Mixed-Effects models using lme4. Journal of Statistical Software. 2015;67:jss.v067.i01. doi: 10.18637/jss.v067.i01.
1. Bechara A, Damasio AR, Damasio H, Anderson SW. Insensitivity to future consequences following damage to human prefrontal cortex. Cognition. 1994;50:7–15. doi: 10.1016/0010-0277(94)90018-3.
1. Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nature Neuroscience. 2007;10:1214–1221. doi: 10.1038/nn1954.
1. Boag RJ, Strickland L, Heathcote A, Neal A, Loft S. Cognitive control and capacity for prospective memory in complex dynamic environments. Journal of Experimental Psychology: General. 2019a;148:2181–2206. doi: 10.1037/xge0000599.
1. Boag RJ, Strickland L, Loft S, Heathcote A. Strategic attention and decision control support prospective memory in a complex dual-task environment. Cognition. 2019b;191:103974. doi: 10.1016/j.cognition.2019.05.011.
1. Boehm U, Hawkins GE, Brown S, van Rijn H, Wagenmakers EJ. Of monkeys and men: impatience in perceptual decision-making. Psychonomic Bulletin & Review. 2016;23:738–749. doi: 10.3758/s13423-015-0958-5.
1. Bogacz R, McClure SM, Li J, Cohen JD, Montague PR. Short-term memory traces for action Bias in human reinforcement learning. Brain Research. 2007;1153:111–121. doi: 10.1016/j.brainres.2007.03.057.
1. Bogacz R, Wagenmakers EJ, Forstmann BU, Nieuwenhuis S. The neural basis of the speed-accuracy tradeoff. Trends in Neurosciences. 2010;33:10–16. doi: 10.1016/j.tins.2009.09.002.
1. Bogacz R, Larsen T. Integration of reinforcement learning and optimal decision-making theories of the basal ganglia. Neural Computation. 2011;23:817–851. doi: 10.1162/NECO_a_00103.
1. Boucher L, Palmeri TJ, Logan GD, Schall JD. Inhibitory control in mind and brain: an interactive race model of countermanding saccades. Psychological Review. 2007;114:376–397. doi: 10.1037/0033-295X.114.2.376.
1. Bowman NE, Kording KP, Gottfried JA. Temporal integration of olfactory perceptual evidence in human orbitofrontal cortex. Neuron. 2012;75:916–927. doi: 10.1016/j.neuron.2012.06.035.
1. Brooks SP, Gelman A. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics. 1998;7:434–455. doi: 10.1080/10618600.1998.10474787.
1. Brown SD, Heathcote A. The simplest complete model of choice response time: linear ballistic accumulation. Cognitive Psychology. 2008;57:153–178. doi: 10.1016/j.cogpsych.2007.12.002.
1. Christakou A, Gershman SJ, Niv Y, Simmons A, Brammer M, Rubia K. Neural and psychological maturation of decision-making in adolescence and young adulthood. Journal of Cognitive Neuroscience. 2013;25:1807–1823. doi: 10.1162/jocn_a_00447.
1. Churchland AK, Kiani R, Shadlen MN. Decision-making with multiple alternatives. Nature Neuroscience. 2008;11:693–702. doi: 10.1038/nn.2123.
1. Cisek P, Puskas GA, El-Murr S. Decisions in changing conditions: the urgency-gating model. Journal of Neuroscience. 2009;29:11560–11571. doi: 10.1523/JNEUROSCI.1844-09.2009.
1. Collins AGE, Frank MJ. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. European Journal of Neuroscience. 2012;35:1024–1035. doi: 10.1111/j.1460-9568.2011.07980.x.
1. Collins AGE, Frank MJ. Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory. PNAS. 2018;115:2502–2507. doi: 10.1073/pnas.1720963115.
1. Costa VD, Tran VL, Turchi J, Averbeck BB. Reversal learning and dopamine: a bayesian perspective. Journal of Neuroscience. 2015;35:2407–2416. doi: 10.1523/JNEUROSCI.1989-14.2015.
1. Daw ND, Kakade S, Dayan P. Opponent interactions between serotonin and dopamine. Neural Networks. 2002;15:603–616. doi: 10.1016/S0893-6080(02)00052-7.
1. Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441:876–879. doi: 10.1038/nature04766.
1. Daw ND, Dayan P. The algorithmic anatomy of model-based evaluation. Philosophical Transactions of the Royal Society B: Biological Sciences. 2014;369:20130478. doi: 10.1098/rstb.2013.0478.
1. Dayan P, Daw ND. Decision theory, reinforcement learning, and the brain. Cognitive, Affective, & Behavioral Neuroscience. 2008;8:429–453. doi: 10.3758/CABN.8.4.429.
1. Ditterich J. Evidence for time-variant decision making. European Journal of Neuroscience. 2006;24:3628–3641. doi: 10.1111/j.1460-9568.2006.05221.x.
1. Donkin C, Brown SD, Heathcote A. The overconstraint of response time models: rethinking the scaling problem. Psychonomic Bulletin & Review. 2009;16:1129–1135. doi: 10.3758/PBR.16.6.1129.
1. Donkin C, Brown S, Heathcote A. Drawing conclusions from choice response time models: a tutorial using the linear ballistic accumulator. Journal of Mathematical Psychology. 2011;55:140–151. doi: 10.1016/j.jmp.2010.10.001.
1. Donkin C, Brown SD. Response Times and Decision-Making. In: Donkin C, editor. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience. John Wiley & Sons, Inc; 2018. pp. 1–33.
1. Dutilh G, Rieskamp J. Comparing perceptual and preferential decision making. Psychonomic Bulletin & Review. 2016;23:723–737. doi: 10.3758/s13423-015-0941-1.
1. Evans NJ, Brown SD, Mewhort DJK, Heathcote A. Refining the law of practice. Psychological Review. 2018;125:592–605. doi: 10.1037/rev0000105.
1. Fontanesi L, Gluth S, Spektor MS, Rieskamp J. A reinforcement learning diffusion decision model for value-based decisions. Psychonomic Bulletin & Review. 2019a;26:1099–1121. doi: 10.3758/s13423-018-1554-2.
1. Fontanesi L, Palminteri S, Lebreton M. Decomposing the effects of context Valence and feedback information on speed and accuracy during reinforcement learning: a meta-analytical approach using diffusion decision modeling. Cognitive, Affective, & Behavioral Neuroscience. 2019b;19:490–502. doi: 10.3758/s13415-019-00723-1.
1. Forstmann BU, Ratcliff R, Wagenmakers EJ. Sequential sampling models in cognitive neuroscience: advantages, applications, and extensions. Annual Review of Psychology. 2016;67:641–666. doi: 10.1146/annurev-psych-122414-033645.
1. Frank MJ, Seeberger LC, O'reilly RC. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 2004;306:1940–1943. doi: 10.1126/science.1102941.
1. Frank MJ, Moustafa AA, Haughey HM, Curran T, Hutchison KE. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. PNAS. 2007;104:16311–16316. doi: 10.1073/pnas.0706111104.
1. Frank MJ, Doll BB, Oas-Terpstra J, Moreno F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nature Neuroscience. 2009;12:1062–1068. doi: 10.1038/nn.2342.
1. Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press; 2007.
1. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Science. 1992;7:457–472. doi: 10.1214/ss/1177011136.
1. Gershman SJ. Do learning rates adapt to the distribution of rewards? Psychonomic Bulletin & Review. 2015;22:1320–1327. doi: 10.3758/s13423-014-0790-3.
1. Hawkins GE, Forstmann BU, Wagenmakers EJ, Ratcliff R, Brown SD. Revisiting the evidence for collapsing boundaries and urgency signals in perceptual decision-making. Journal of Neuroscience. 2015;35:2476–2484. doi: 10.1523/JNEUROSCI.2410-14.2015.
1. Hawkins GE, Heathcote A. Racing against the clock: evidence-based vs Time-Based decisions. Psychological Review. 2020;2020:m4uh7. doi: 10.31234/.
1. Heathcote A, Brown S, Mewhort DJ. The power law repealed: the case for an exponential law of practice. Psychonomic Bulletin & Review. 2000;7:185–207. doi: 10.3758/BF03212979.
1. Heathcote A, Brown SD, Wagenmakers E-J. An Introduction to Good Practices in Cognitive ModelingAn Introduction to Model-Based Cognitive Neuroscience. Springer; 2015.
1. Heathcote A, Lin YS, Reynolds A, Strickland L, Gretton M, Matzke D. Dynamic models of choice. Behavior Research Methods. 2019;51:961–985. doi: 10.3758/s13428-018-1067-y.
1. Heathcote A, Love J. Linear deterministic accumulator models of simple choice. Frontiers in Psychology. 2012;3:1–19. doi: 10.3389/fpsyg.2012.00292.
1. Ho T, Brown S, van Maanen L, Forstmann BU, Wagenmakers EJ, Serences JT. The optimality of sensory processing during the speed-accuracy tradeoff. Journal of Neuroscience. 2012;32:7992–8003. doi: 10.1523/JNEUROSCI.0340-12.2012.
1. Izquierdo A, Brigman JL, Radke AK, Rudebeck PH, Holmes A. The neural basis of reversal learning: an updated perspective. Neuroscience. 2017;345:12–26. doi: 10.1016/j.neuroscience.2016.03.021.
1. Jang AI, Costa VD, Rudebeck PH, Chudasama Y, Murray EA, Averbeck BB. The role of frontal cortical and Medial-Temporal lobe brain Areas in learning a bayesian prior belief on reversals. The Journal of Neuroscience. 2015;35:11751–11760. doi: 10.1523/JNEUROSCI.1594-15.2015.
1. Katsimpokis D, Hawkins GE, van Maanen L. Not all Speed-Accuracy Trade-Off manipulations have the same psychological effect. Computational Brain & Behavior. 2020;3:252–268. doi: 10.1007/s42113-020-00074-y.
1. Kuznetsova A, Brockhoff PB, Christensen RHB. lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software. 2017;82:13. doi: 10.18637/jss.v082.i13.
1. Leite FP, Ratcliff R. Modeling reaction time and accuracy of multiple-alternative decisions. Attention, Perception, & Psychophysics. 2010;72:246–273. doi: 10.3758/APP.72.1.246.
1. Logan GD, Van Zandt T, Verbruggen F, Wagenmakers EJ. On the ability to inhibit thought and action: general and special theories of an act of control. Psychological Review. 2014;121:66–95. doi: 10.1037/a0035230.
1. Luzardo A, Alonso E, Mondragón E. A Rescorla-Wagner drift-diffusion model of conditioning and timing. PLOS Computational Biology. 2017;13:e1005796. doi: 10.1371/journal.pcbi.1005796.
1. Mazurek ME, Roitman JD, Ditterich J, Shadlen MN. A role for neural integrators in perceptual decision making. Cerebral Cortex. 2003;13:1257–1269. doi: 10.1093/cercor/bhg097.
1. McDougle SD, Collins AGE. Modeling the influence of working memory, reinforcement, and action uncertainty on reaction time and choice during instrumental learning. Psychonomic Bulletin & Review. 2020;19:01774. doi: 10.3758/s13423-020-01774-z.
1. Miletić S. Neural evidence for a role of urgency in the Speed-Accuracy Trade-off in perceptual Decision-Making. The Journal of Neuroscience. 2016;36:5909–5910. doi: 10.1523/JNEUROSCI.0894-16.2016.
1. Miletić S, Turner BM, Forstmann BU, van Maanen L. Parameter recovery for the leaky competing accumulator model. Journal of Mathematical Psychology. 2017;76:25–50. doi: 10.1016/j.jmp.2016.12.001.
1. Miletić S, Boag RJ, Forstmann BU. Mutual benefits: combining reinforcement learning with sequential sampling models. Neuropsychologia. 2020;136:107261. doi: 10.1016/j.neuropsychologia.2019.107261.
1. Miletić S, van Maanen L. Caution in decision-making under time pressure is mediated by timing ability. Cognitive Psychology. 2019;110:16–29. doi: 10.1016/j.cogpsych.2019.01.002.
1. Millner AJ, Gershman SJ, Nock MK, den Ouden HEM. Pavlovian control of escape and avoidance. Journal of Cognitive Neuroscience. 2018;30:1379–1390. doi: 10.1162/jocn_a_01224.
1. Milosavljevic M, Malmaud J, Huth A, Koch C, Rangel A. The drift diffusion model can account for the accuracy and reaction time of Value-Based choices under high and low time pressure. Judgment and Decision Making. 2010;5:437–449. doi: 10.2139/ssrn.1901533.
1. Moran R. Thou shalt identify! the identifiability of two high-threshold models in confidence-rating recognition (and super-recognition) paradigms. Journal of Mathematical Psychology. 2016;73:1–11. doi: 10.1016/j.jmp.2016.03.002.
1. Murphy PR, Boonstra E, Nieuwenhuis S. Global gain modulation generates time-dependent urgency during perceptual choice in humans. Nature Communications. 2016;7:13526. doi: 10.1038/ncomms13526.
1. Niv Y, Edlund JA, Dayan P, O'Doherty JP. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. Journal of Neuroscience. 2012;32:551–562. doi: 10.1523/JNEUROSCI.5498-10.2012.
1. O'Doherty JP, Cockburn J, Pauli WM. Learning, reward, and decision making. Annual Review of Psychology. 2017;68:73–100. doi: 10.1146/annurev-psych-010416-044216.
1. Pachella RG, Pew RW. Speed-Accuracy tradeoff in reaction time: effect of discrete criterion times. Journal of Experimental Psychology. 1968;76:19–24. doi: 10.1037/h0021275.
1. Palminteri S, Khamassi M, Joffily M, Coricelli G. Contextual modulation of value signals in reward and punishment learning. Nature Communications. 2015;6:8096. doi: 10.1038/ncomms9096.
1. Palminteri S, Wyart V, Koechlin E. The importance of falsification in computational cognitive modeling. Trends in Cognitive Sciences. 2017;21:425–433. doi: 10.1016/j.tics.2017.03.011.
1. Pedersen ML, Frank MJ, Biele G. The drift diffusion model as the choice rule in reinforcement learning. Psychonomic Bulletin & Review. 2017;24:1234–1251. doi: 10.3758/s13423-016-1199-y.
1. Pedersen ML, Frank MJ. Simultaneous hierarchical bayesian parameter estimation for reinforcement learning and drift diffusion models: a tutorial and links to neural data. Computational Brain & Behavior. 2020;3:458–471. doi: 10.1007/s42113-020-00084-w.
1. Peirce J, Gray JR, Simpson S, MacAskill M, Höchenberger R, Sogo H, Kastman E, Lindeløv JK. PsychoPy2: experiments in behavior made easy. Behavior Research Methods. 2019;51:195–203. doi: 10.3758/s13428-018-01193-y.
1. Purcell BA, Heitz RP, Cohen JY, Schall JD, Logan GD, Palmeri TJ. Neurally constrained modeling of perceptual decision making. Psychological Review. 2010;117:1113–1143. doi: 10.1037/a0020311.
1. R Development Core Team . Vienna, Austria: R Foundation for Statistical Computing; 2017.
1. Rae B, Heathcote A, Donkin C, Averell L, Brown S. The hare and the tortoise: emphasizing speed can change the evidence used to make decisions. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2014;40:1226–1243. doi: 10.1037/a0036801.
1. Ratcliff R. A theory of memory retrieval. Psychological Review. 1978;85:59–108. doi: 10.1037/0033-295X.85.2.59.
1. Ratcliff R, Hasegawa YT, Hasegawa RP, Smith PL, Segraves MA. Dual diffusion model for single-cell recording data from the superior colliculus in a brightness-discrimination task. Journal of Neurophysiology. 2007;97:1756–1774. doi: 10.1152/jn.00393.2006.
1. Ratcliff R, Hasegawa YT, Hasegawa RP, Childers R, Smith PL, Segraves MA. Inhibition in superior colliculus neurons in a brightness discrimination task? Neural Computation. 2011;23:1790–1820. doi: 10.1162/NECO_a_00135.
1. Ratcliff R, Smith PL, Brown SD, McKoon G. Diffusion decision model: current issues and history. Trends in Cognitive Sciences. 2016;20:260–281. doi: 10.1016/j.tics.2016.01.007.
1. Ratcliff R, Voskuilen C, Teodorescu A. Modeling 2-alternative forced-choice tasks: accounting for both magnitude and difference effects. Cognitive Psychology. 2018;103:1–22. doi: 10.1016/j.cogpsych.2018.02.002.
1. Ratcliff R, McKoon G. The diffusion decision model: theory and data for two-choice decision tasks. Neural Computation. 2008;20:873–922. doi: 10.1162/neco.2008.12-06-420.
1. Ratcliff R, Rouder JN. Modeling response times for Two-Choice decisions. Psychological Science. 1998;9:347–356. doi: 10.1111/1467-9280.00067.
1. Reddi BA, Carpenter RH. The influence of urgency on decision time. Nature Neuroscience. 2000;3:827–830. doi: 10.1038/77739.
1. Rescorla RA, Wagner AR. A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Class Cond II Curr Res Theory. 1972;21:64–99.
1. Rummery GA, Niranjan M. On-Line Q-Learning Using Connectionist Systems 1994
1. Satterthwaite FE. Synthesis of variance. Psychometrika. 1941;6:309–316. doi: 10.1007/BF02288586.
1. Sewell DK, Jach HK, Boag RJ, Van Heer CA. Combining error-driven models of associative learning with evidence accumulation models of decision-making. Psychonomic Bulletin & Review. 2019;26:868–893. doi: 10.3758/s13423-019-01570-4.
1. Sewell DK, Stallman A. Modeling the effect of speed emphasis in probabilistic category learning. Computational Brain & Behavior. 2020;3:129–152. doi: 10.1007/s42113-019-00067-6.
1. Shahar N, Hauser TU, Moutoussis M, Moran R, Keramati M, NSPN consortium. Dolan RJ. Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling. PLOS Computational Biology. 2019;15:e1006803. doi: 10.1371/journal.pcbi.1006803.
1. Spektor MS, Kellen D. The relative merit of empirical priors in non-identifiable and sloppy models: applications to models of learning and decision-making : empirical priors. Psychonomic Bulletin & Review. 2018;25:2047–2068. doi: 10.3758/s13423-018-1446-5.
1. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B. 2002;64:583–639. doi: 10.1111/1467-9868.00353.
1. Sutton RS. Learning to predict by the methods of temporal differences. Machine Learning. 1988;3:9–44. doi: 10.1007/BF00115009.
1. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press; 2018.
1. Teodorescu AR, Moran R, Usher M. Absolutely relative or relatively absolute: violations of value invariance in human decision making. Psychonomic Bulletin & Review. 2016;23:22–38. doi: 10.3758/s13423-015-0858-8.
1. Ter Braak CJF. A markov chain monte carlo version of the genetic algorithm differential evolution: easy bayesian computing for real parameter spaces. Statistics and Computing. 2006;16:239–249. doi: 10.1007/s11222-006-8769-1.
1. Thura D, Cisek P. Modulation of premotor and primary motor cortical activity during volitional adjustments of Speed-Accuracy Trade-Offs. The Journal of Neuroscience. 2016;36:938–956. doi: 10.1523/JNEUROSCI.2230-15.2016.
1. Tillman G, Van Zandt T, Logan GD. Sequential sampling models without random between-trial variability: the racing diffusion model of speeded decision making. Psychonomic Bulletin & Review. 2020;27:911–936. doi: 10.3758/s13423-020-01719-6.
1. Tran H, Van Maanen L, Matzke D. Systematic parameter reviews in cognitive modeling: towards robust and cumulative models of psychological processes. Frontiers in Psychology. 2021;11:608287. doi: 10.3389/fpsyg.2020.608287.
1. Trueblood JS, Heathcote A, Evans NJ, Holmes WR. Urgency, leakage, and the relative nature of information processing in decision-making. Psychological Review. 2021;128:160–186. doi: 10.1037/rev0000255.
1. Turner BM, Sederberg PB, Brown SD, Steyvers M. A method for efficiently sampling from distributions with correlated dimensions. Psychological Methods. 2013;18:368–384. doi: 10.1037/a0032222.
1. Turner BM. Toward a common representational framework for adaptation. Psychological Review. 2019;126:660–692. doi: 10.1037/rev0000148.
1. Turner BM, Sederberg PB. A generalized, likelihood-free method for posterior estimation. Psychonomic Bulletin & Review. 2014;21:227–250. doi: 10.3758/s13423-013-0530-0.
1. Tversky A, Kahneman D. Advances in prospect theory: cumulative representation of uncertainty. Journal of Risk and Uncertainty. 1992;5:297–323. doi: 10.1007/BF00122574.
1. van Maanen L, van der Mijn R, van Beurden M, Roijendijk LMM, Kingma BRM, Miletić S, van Rijn H. Core body temperature speeds up temporal processing and choice behavior under deadlines. Scientific Reports. 2019;9:10053. doi: 10.1038/s41598-019-46073-3.
1. van Maanen L, Miletić S. The interpretation of behavior-model correlations in unidentified cognitive models. Psychonomic Bulletin & Review. 2020;32:01783. doi: 10.3758/s13423-020-01783-y.
1. van Ravenzwaaij D, Brown SD, Marley AAJ, Heathcote A. Accumulating advantages: a new conceptualization of rapid multiple choice. Psychological Review. 2020;127:186–215. doi: 10.1037/rev0000166.
1. Voss A, Rothermund K, Voss J. Interpreting the parameters of the diffusion model: an empirical validation. Memory & Cognition. 2004;32:1206–1220. doi: 10.3758/BF03196893.
1. Voss A, Nagler M, Lerche V. Diffusion models in experimental psychology: a practical introduction. Experimental Psychology. 2013;60:385–402. doi: 10.1027/1618-3169/a000218.

Source: PubMed

A new model of decision processing in instrumental learning tasks

Abstract

Conflict of interest statement

Figures

References

Szponzorok és közreműködők

Egészségi állapot

Kábítószer-beavatkozások