A computational approach to the N-back task

Analog N-back task

Subjects

Nine subjects (five women, four men, average age: 21 years old) participated. All subjects had a normal or corrected-to-normal vision. Each subject completed four 1-hour sessions and was compensated at $10 per session, with a $10 completion bonus. The study adhered to the Declaration of Helsinki and was approved by the Institutional Review Board of New York University.

Apparatus and stimuli

The stimuli were displayed on an iPad Retina screen with a resolution of 1280 (times) 960 pixels and a refresh rate of 60 Hz. The screen was attached to an adjustable arm that was mounted on a rail. A chin rest was mounted on the same rail. Subjects were seated in a dark room and viewed the display from a distance of 40 cm, so that 1 cm on the screen corresponded to approximately 1.4 degrees of visual angle (dva).

All stimuli were generated with Psychophysics Toolbox 3 in MATLAB64. The background was medium gray, with a luminance of around 39 cd/m2. The fixation cross was white with a luminance of around 195 cd/m2, and consisted of a horizontal and a vertical line segment, each of length 0.33 dva. An orientation stimulus was an oriented ellipse with a long axis of 2.9 dva, a short axis of 1.7 dva, and a luminance of around 120 cd/m2. The orientation of an ellipse was selected from a discrete uniform distribution on the integers between 0°and 179°. A color stimulus was a colored disc with a diameter of 2.9 dva and a color drawn from a discrete uniform distribution on 360 color values (luminance: around 40 cd/m2) that were evenly distributed along a circle in the fixed-L plane of CIE 1976 ((L^*), (a^*), (b^*)) color space ((L=54) center (a=18), (b=8)) and radius 59. The center point of the fixation point, the ellipse, or the disc was at the center of the screen.

Trial, sequence and procedure

An Orientation-Only (or Color-Only) sequence began with the appearance of the fixation cross for 1000 ms, followed by a sequence of oriented ellipses (or colored disc). Each ellipse (disc) was presented for 800 ms, followed by an inter-stimulus interval (ISI) of 1000 ms, during which the subject had to respond whether the stimulus they just saw (probe) had the same orientation (or color) as the stimulus that was presented two stimuli earlier (target) in the sequence. The subject pressed J for “same” and F for “different”. If the subject did not respond within the 1000 ms ISI, the trial was counted as a time-out. After the subject’s response or after the time-out, auditory feedback was given, indicating a correct response (high-pitched sound), an incorrect response (low-pitched sound), or a time-out (buzz sound) (Fig. 3). The trial procedure was the same for an Alternating sequence except that a sequence of alternating ellipses and discs followed the fixation cross. As a result, the subject had to alternate between comparing the orientation of an ellipse (probe) to the previous oriented ellipse (target, still 2 stimuli earlier in the sequence) and comparing the color of a disc (probe) to the previous colored disc (target).

An Orientation-Only (Color-Only) sequence consisted of 53 stimuli, and an Alternating sequence consisted of 56 stimuli. The first two stimuli in each sequence were removed from the analysis because they had no 2-back stimuli. Thus, the Orientation-Only (Color-Only) sequence had 51 valid stimuli, while the Alternating sequence, comprising 27 oriented ellipses and 27 colored discs, had 54 valid stimuli. Sequence lengths were chosen to ensure that the orientation or color of the probe was the same as that of the target on exactly (frac{1}{3}) of trials and was different (non-match) on the remaining (frac{2}{3}) of trials in a sequence. On non-match trials, the feature difference between the probe and target was randomly sampled from a uniform distribution on the integers between (-90^circ) and (90^circ) for orientation and between (-180^circ) and (180^circ) for color.

The three sequence types gave rise to four trial conditions: an orientation trial in an Orientation-Only sequence (Ori), a color trial in a Color-Only sequence (Col), an orientation trial in an Alternating sequence (Ori-A), and a color trial in an Alternating sequence (Col-A). In the Ori-A and Col-A (alternating-stimulus) conditions, the 1-back stimulus did not possess the same feature as the probe.

The experiment consisted of 16 Orientation-Only sequences, 16 Color-Only sequences, and 32 Alternating sequences, resulting in a total of 64 sequences. The sequences were grouped into blocks, each of which consisted of 4 sequences of the same type. Consequently, the experiment was composed of 4 Orientation-Only blocks, 4 Color-Only blocks, and 8 Alternating blocks. The 16 blocks were equally divided into 4 sessions. In the first session, each subject performed one block of each of the three sequence types, with the sequence type of the fourth block randomly chosen. In the remaining three sessions, the sequence type for each block was randomly selected. Overall, each subject completed 848 Ori trials, 848 Col trials, 896 Ori-A trials, and 896 Col-A trials. In the first session, at the start of each new sequence type, the subject practiced a 20-stimuli sequence in that block in the presence of the experimenter. Thus, the subject experienced three separate practice sequences in the first session. No practice took place in later sessions.

Modeling

No interference model

In the No Interference (NI) model, the 1-back distractor does not exert any influence on the observer’s response.

Encoding stage. We denote the target by (s_text {t}) and the probe by (s_text {p}). We further denote the observer’s noisy measurements of these stimuli by (x_text {t}) and (x_text {p}), respectively. We assume that these measurements follow von Mises distributions with means (s_text {t}) and (s_text {p}), respectively, and a shared concentration parameter (kappa):

$$begin{aligned} begin{aligned} p(x_text {p}|s_text {p}; kappa )&= frac{1}{2pi I_0(kappa )}e^{kappa text {cos}(x_text {p}-s_text {p})}; p(x_text {t}|s_text {t}; kappa )&= frac{1}{2pi I_0(kappa )}e^{kappa text {cos}(x_text {t}-s_text {t})}, end{aligned} end{aligned}$$

(1)

where (I_0) is the modified Bessel function of the first kind of order 0. The orientation (color) of our stimuli has a range of [(-90^{circ }), (90^{circ })] ([(-180^{circ }), (180^{circ })]). However, in the models (i.e., equations and corresponding simulations), we remap the orientations to the range [(-180^{circ }), (180^{circ })] on which the von Mises distribution is defined. We used the true orientation space in all figures.

Decision stage. Based on the noisy measurements of the probe and target, the observer has to decide whether the probe has the same feature as the target. We denote “same” by (C=1) and “different” by (C=0). The optimal way to make this same/different decision is to choose the C with the highest posterior probability. Thus, the optimal observer makes a decision based on the probabilities (p(C=1| x_text {p}, x_text {t})) and (p(C=0| x_text {p}, x_text {t})). We define the decision variable as the log posterior ratio, denoted by (d_text {tiny {NI}}), and apply Bayes’ rule:

$$begin{aligned} begin{aligned} d_text {tiny {NI}}&=log frac{pleft( C=1|x_text {p},x_text {t}right) }{pleft( C=0|x_text {p},x_text {t}right) }&= log frac{p(C = 1)}{p(C = 0)} + log frac{p(x_text {p}, x_text {t}|C=1)}{p(x_text {p}, x_text {t}|C=0)}, end{aligned} end{aligned}$$

(2)

where (p(C = 1)) is the observer’s prior probability that the probe matches the target. The optimal observer responds “same” when the posterior of ({p(C=1|x_text {p},x_text {t})}) is greater than 0.5, or equivalently, when (d_text {tiny {NI}}>0). To compute the class likelihoods (p(x_text {p}, x_text {t}|C=0)) and (p(x_text {p}, x_text {t}|C=1)), the observer needs to marginalize (average) over the hidden variables (s_text {p}) and (s_text {t}). Thus,

$$begin{aligned} d_text {tiny {NI}}&= log frac{p(C = 1)}{p(C = 0)} + log frac{iint p(x_text {p}left| s_text {p}right. )p(x_text {t}left| s_text {t}right. )p(s_text {p}, s_text {t}|C=1)ds_text {p}ds_text {t}}{iint p(x_text {p}left| s_text {p}right. )p(x_text {t}left| s_text {t}right. )p(s_text {p}, s_text {t}|C=0)ds_text {p}ds_text {t}}. end{aligned}$$

(3)

Substituting the distributions from the encoding model and evaluating the integrals, we find (see Appendix A for derivation):

$$begin{aligned} d_text {tiny {NI}} = log frac{p(C = 1)}{p(C = 0)} + log frac{I_0 left( kappa sqrt{2+2 ,cos , Delta _{xtext {t}}}right) }{I_0(kappa )^2}. end{aligned}$$

(4)

As expected based on symmetry considerations, the decision variable is solely determined by the difference (Delta _{xtext {t}} equiv x_text {p}-x_text {t}), and not by (x_text {p}) and (x_text {t}) individually.

Response probabilities. The model predicts the probability that the observer reports “same” for a given stimulus combination ((s_text {p}, s_text {t})), which we denote as (p({hat{C}} = 1| s_text {p}, s_text {t})). On a single trial, the measurements (x_text {p}) and (x_text {t}) are not known to the experimenter. Thus, the experimenter needs to average over all possible values of (x_text {p}) and (x_text {t}) for a given stimulus combination to calculate the response probability on the trial in question. (This marginalization is conceptually very different from the one in the decision stage. There the observer performs the marginalization over the hidden variables (s_text {p}) and (s_text {t}), which are unknown to the observer but known to the experimenter. Here, the experimenter marginalizes over the measurements, which are known to the observer but unknown to the experimenter.) The probability of reporting “same” for a given combination of (s_text {p}) and (s_text {t}) is given by:

$$begin{aligned} begin{aligned} pleft( {hat{C}} = 1| s_text {p}, s_text {t}right)&= iint pleft( {hat{C}} = 1|{x_text {p}, x_text {t}}right) pleft( x_text {p}|s_text {p}right) pleft( x_text {t}|s_text {t}right) dx_text {p} dx_text {t}&= iint limits _{{({x_text {t}},, {x_text {p}})}: , d_text {tiny {NI}}>0} pleft( x_text {p}|s_text {p}right) pleft( x_text {t}|s_text {t}right) dx_text {p} dx_text {t}, end{aligned} end{aligned}$$

(5)

where (d_text {tiny {NI}}>0) defines an area in the space spanned by (x_text {t}) and (x_text {p}). To calculate the response probability, we first computed the joint probability of (pleft( x_text {p}|s_text {p}right) pleft( x_text {t}|s_text {t}right)) and then multiplied it by the mask defined by the area (d_text {tiny {NI}}>0). We evaluated the resulting integral numerically on a grid. As one would expect, the result depends only on (Delta _{text {t}} equiv s_text {p} – s_text {t}), but not on (s_text {p}) and (s_text {t}) individually (see Appendix B for derivation). Therefore, we will write (pleft( {hat{C}} = 1| Delta _text {t}right)) instead of (pleft( {hat{C}} = 1| s_text {p}, s_text {t}right)).

Lapse rate. Due to lapses of attention, the observer might randomly guess on a small proportion of trials. We take this possibility into account by including a lapse rate in the NI model. On each trial, there is a probability (lambda) that the observer generates a random guess. The predicted probability of reporting “same” then becomes:

$$begin{aligned} pleft( {hat{C}} = 1| s_text {p}, s_text {t}right) = frac{lambda }{2}+(1-lambda ) iint limits _{{({x_text {t}},, {x_text {p}})}: , d_text {tiny {NI}}>0} pleft( x_text {p}|s_text {p}right) pleft( x_text {t}|s_text {t}right) dx_text {p} dx_text {t}. end{aligned}$$

(6)

Early pooling model

Encoding stage. The Early pooling (EP) model postulates that the measurement of the target, (x_text {t}), gets mixed with the measurement of the distractor, (x_text {d}). That is, interference occurs in the encoding stage. To specify this mixing, we introduce the two-dimensional unit vectors (within the range of [(-180, 180)]) corresponding to these measurements, which we denote by ({textbf{X}}_text {t}) and ({textbf{X}}_text {d}), respectively. The mixed measurement is associated with a weighted average of these two unit vectors (see Fig. A1a and Fig. A2 for graphic illustration):

$$begin{aligned} {textbf{X}}_text {mix} = w{textbf{X}}_text {t} + (1-w) {textbf{X}}_text {d}. end{aligned}$$

(7)

As a result, this mixed measurement (x_text {mix}) is the weighted circular mean of (x_text {t}) and (x_text {d}):

$$begin{aligned} x_text {mix} = {{,textrm{atan2},}}left( wsin x_text {t}+left( 1-wright) sin x_text {d}, hspace{5.0pt}wcos x_text {t}+left( 1-wright) cos x_text {d}right) . end{aligned}$$

(8)

(x_text {d}) also follows the von Mises distribution with a mean (s_text {d}) and a shared concentration parameter (kappa) with (x_text {p}) and (x_text {t}).

Decision stage. The observer makes the same/different decision based on (p(C=1| x_text {p}, x_text {mix})) and (p(C=0| x_text {p}, x_text {mix})), and chooses C with the highest posterior probability. Unlike in the NI model, the probabilities are now based on the inputs (x_text {p}) and (x_text {mix}) rather than on (x_text {p}) and (x_text {t}). The log posterior ratio is now:

$$begin{aligned} d_text {tiny {EP}} = log frac{pleft( C=1|x_text {p},x_text {mix}right) }{pleft( C=0|x_text {p},x_text {mix}right) } = log frac{p(C = 1)}{p(C = 0)} + log frac{p(x_text {p}, x_text {mix}|C=1)}{p(x_text {p}, x_text {mix}|C=0)}. end{aligned}$$

(9)

The observer reports “same” when the posterior of (p(C = 1|x_text {p}, x_text {mix})) is greater than 0.5, or equivalently, when (d_text {tiny {EP}} > 0). To compute (d_text {tiny {EP}}), the observer needs to compute the class likelihoods (p(x_text {p}, x_text {mix}|C=0)) and (p(x_text {p}, x_text {mix}|C=1)). To do so, like in the NI model, the observer needs to marginalize over the hidden variables (s_text {p}) and (s_text {t}), which are unknown to them. In addition, the observer should marginalize over the hidden variable (s_text {d}), as its measurement (x_text {d}) also affects (x_text {mix}). Importantly, because the “pure” measurements (x_text {t}) and (x_text {d}) that determine (x_text {mix}) are unknown to the observer, the observer needs to marginalize over all possible combinations of (x_text {t}) and (x_text {d}) that generate (x_text {mix}). Thus, to compute (d_text {tiny {EP}}), the optimal observer would perform a marginalization over five variables: (s_text {p}), (s_text {t}), (s_text {d}), (x_text {t}) and (x_text {d}):

$$begin{aligned} d_text {tiny {EP}}&= log frac{p(C = 1)}{p(C = 0)} nonumber &quad + log frac{displaystyle int !!iiiint pleft( x_text {mix}, x_text {p}left| x_text {t},x_text {d},s_text {p},s_text {t},s_text {d}right. right) p left( x_text {t},x_text {d},s_text {t},s_text {d},s_text {p}left| C=1 right. right) dx_text {t}dx_text {d}ds_text {t} ds_text {d}ds_text {p}}{displaystyle int !!iiiint pleft( x_text {mix}, x_text {p}left| x_text {t},x_text {d},s_text {p},s_text {t},s_text {d}right. right) p left( x_text {t},x_text {d},s_text {t},s_text {d},s_text {p}left| C=0 right. right) dx_text {t}dx_text {d}ds_text {t} ds_text {d}ds_text {p}}. end{aligned}$$

(10)

The posterior ratio can be evaluated to (see Appendix C for derivation):

$$begin{aligned} d_text {tiny {EP}} = left{ begin{array}{ll} 0 & text {if } w< 0.5; begin{array}{l} log displaystyle frac{p(C = 1)}{p(C = 0)} + log displaystyle frac{1}{I_0(kappa )^2} + +log displaystyle frac{1}{2arcsin frac{1-w}{w}} int _{- arcsin left( frac{1-w}{w}right) }^{arcsin left( frac{1-w}{w}right) } I_0left( kappa sqrt{2 + 2,cos (x_text {p}-x_text {mix}-tilde{x})}right) dtilde{x} end{array}&text {if } w ge 0.5, end{array}right. end{aligned}$$

(11)

where (tilde{x}) is the difference between the hypothesized ({x_text {t}}) and ({x_text {d}}), which lies within the range of (left[ – arcsin left( frac{1-w}{w}right) , arcsin left( frac{1-w}{w}right) right]). The decision variable states that, despite a “non-optimal” mixing in encoding, the observer is still able to make the best possible decision based on the compromised information by demixing all possible combinations of (x_text {t}) and (x_text {d}) that generate (x_text {mix}). The demixing is achievable because the values of (x_text {t}) and (x_text {d}) for a given (x_text {mix}) are constrained by:

$$begin{aligned} x_text {d} = {{,textrm{atan2},}}left( Rsin x_text {mix}- wsin x_text {t}, hspace{5.0pt}Rcos x_text {mix}-wcos x_text {t}right) , end{aligned}$$

(12)

where (R = sqrt{2w^2-2w+1+2w(1-w) cos (x_text {t}-x_text {d})}).

Here, we provide an intuition regarding the benefits of demixing. The purpose of demixing is to discount the effect of the distractor by reconstructing the possible values of (x_text {t}) so that the observer could still compare (x_text {p}) with (x_text {t}) and then decide. When (w < 0.5), we show that (x_text {t}) could be any value in the measurement domain because, for any (x_text {t}), you can always find a (x_text {d}) that together generate a given (x_text {mix}) (see Figs. A1b and A2a for illustrations). That means having access to the (x_text {mix}) tells you nothing about the (x_text {t}), i.e., your uncertainty regarding (x_text {t}) is not reduced. As a result, the observer’s decision based on (x_text {p}) and (x_text {mix}) is no better than a random guess because they gain no information about (x_text {t})! In the extreme case of (w = 0), (x_text {mix}) is equal to (x_text {d}), and a binary decision based on (x_text {p}) and (x_text {d}) is no better than a coin flip. When (w ge 0.5), we show that (x_text {t}) becomes constrained by (x_text {mix}) (see Figs. A1b,c and A2b). Specifically, (x_text {t}) is confined to a certain range, the size of which varies with w: the allowable range of (x_text {t}) gradually narrows down as w grows. Therefore, despite the fact that the observer has access only to (x_text {mix}), knowing (x_text {mix}) informs them about (x_text {t}) and helps them with their decisions! In the extreme case of (w = 1), (x_text {mix}) is equal to (x_text {t}), the decision variable reduces to Eq. (4) in the NI model. (To help readers understand the de-mixing, we also provide an analogy with numbers in Fig. A3 in Appendix C.)

For the decision variable (d_text {tiny {EP}}), we now end up with two equations, corresponding to the cases of (w < 0.5) and (w ge 0.5), respectively (see Appendix C for derivation):

$$begin{aligned} d_text {tiny {EP}} = left{ begin{array}{ll} 0 & text {if } w < 0.5, begin{array}{l} log displaystyle frac{p(C = 1)}{p(C = 0)} + log displaystyle frac{1}{I_0(kappa )^{2}} + log displaystyle frac{ displaystyle int _{- arcsin frac{1-w}{w}}^{arcsin frac{1-w}{w}} I_0 left( kappa sqrt{2+frac{2(wcos (Delta _{xtext {t}}-tilde{x}) + (1-w)cos (Delta _{xtext {d}}-tilde{x}))}{sqrt{2w^2-2w+1+2w(1-w) cos (Delta _{xtext {t}}-Delta _{xtext {d}})}}}right) dtilde{x}}{displaystyle 2arcsin frac{1-w}{w}} end{array}&text {if } w ge 0.5.end{array}right. end{aligned}$$

(13)

Response probabilities. The EP model predicts the probability that the observer reports “same” given a combination of (s_text {p}) and (s_text {mix}), (p({hat{C}} = 1|s_text {p}, s_text {mix})). Since (s_text {mix}) is the weighted circular mean of (s_text {t}) and (s_text {d}), we rewrite the response probability in terms of (s_text {p}), (s_text {t}), and (s_text {d}), (p({hat{C}} = 1|s_text {p}, s_text {t}, s_text {d})) instead (Eq. (13)). On a single trial, we average over all the possible measurements (x_text {p}), (x_text {d}), and (x_text {t}), all unknown to the experimenter, to calculate the response probability. Therefore, the probability of reporting “same” for a given stimulus combination of (s_text {p}), (s_text {t}), and (s_text {d}) is given by:

$$begin{aligned} begin{aligned} p({hat{C}} = 1|s_text {p}, s_text {t}, s_text {d})&= iiint pleft( {hat{C}} = 1|{x_text {p}, x_text {t}}, x_text {d}right) pleft( x_text {p}|s_text {p}right) pleft( x_text {t}|s_text {t}right) pleft( x_text {d}|s_text {d}right) d x_text {p} dx_text {t} d x_text {d}&= iiint limits _{left( {x_text {p}},, {x_text {d}}, , x_text {t}right) : , d_text {tiny {EP}}> 0} pleft( x_text {p}|s_text {p}right) pleft( x_text {t}|s_text {t}right) pleft( x_text {d}|s_text {d}right) d x_text {p} dx_text {t} d x_text {d}, end{aligned} end{aligned}$$

(14)

where (d_text {tiny {EP}}>0) defines an area in the three-dimensional space spanned by (x_text {t}), (x_text {d}), and (x_text {p}). We evaluated the above integral numerically on a grid. The result depends on (Delta _text {t} equiv s_text {p} – s_text {t}) and (Delta _text {d} equiv s_text {p} – s_text {d}), but not on (s_text {p}), (s_text {t}), and (s_text {d}) individually. Therefore, we will write (pleft( {hat{C}} = 1| Delta _text {t}, Delta _text {d}right)) instead of (pleft( {hat{C}} = 1| s_text {p}, s_text {t}, s_text {d}right)). Finally, like in the NI model, we allow for a lapse rate.

Early pooling model without de-mixing. The EP model assumes that the observer has access only to the mixed measurement of the target and distractor, but still makes the best possible decision by performing de-mixing. An alternative of the early pooling hypothesis is that the observer simply treats the mixed measurement as if it were the target measurement and then compares it with the probe measurement. The decision variable of this naive Early Pooling (denoted as EPn) model is similar to that of the NI model except that the target measurement (x_text {t}) is replaced by the mixed measurement (x_text {mix}). Thus, we have

$$begin{aligned} begin{aligned} d_text {tiny {EPn}}&= log frac{p(C = 1)}{p(C = 0)} + log frac{I_0 left( kappa sqrt{2+2 ,cos , (x_text {p} – x_text {mix})} right) }{I_0(kappa )^2}&= log frac{p(C = 1)}{p(C = 0)} + log frac{1}{I_0(k)^2} + log I_0 left( kappa sqrt{2+frac{2wcos (Delta _{xtext {t}} ) + (1-w) cos (Delta _{xtext {d}} )}{sqrt{2w^2-2w+1+2w(1-w) cos (Delta _{xtext {t}} -Delta _{xtext {d}} )}}}right) . end{aligned} end{aligned}$$

(15)

Unlike in the EP model, there is no de-mixing. As a result, the EPn model always has a lower accuracy than the EP model, given the same set of parameters (see Fig. A4 in Appendix D).

Late pooling model

In addition to the Early Pooling models, we also propose a Late Pooling (LP) model, which asserts that the mixing of the target and distractor occurs at a late stage in the processing. Specifically, the observer employs a suboptimal decision rule obtained by mixing the decision variables based on the target and the distractor separately. Thus, on a single trial, the observer computes the decision variables based on the measurements of the probe and target, ({d_text {tiny {NI}}}(x_text {p},x_text {t})) and the measurements of the probe and distractor, ({d_text {tiny {NI}}}(x_text {p},x_text {d})), separately, and then makes a decision by taking a weighted average of the two decision variables:

$$begin{aligned} begin{aligned} d_text {tiny {LP}}&= w {d_text {tiny {NI}}}(x_text {p},x_text {t})+(1-w) {d_text {tiny {NI}}}(x_text {p},x_text {d})&= wlog frac{pleft( C=1|x_text {p},x_text {t}right) }{pleft( C=0|x_text {p},x_text {t}right) } + (1-w)log frac{pleft( C=1|x_text {p},x_text {d}right) }{pleft( C=0|x_text {p},x_text {d}right) }&= log frac{p(C = 1)}{p(C = 0)} + wlog frac{p(x_text {p}, x_text {t}|C=1)}{p(x_text {p}, x_text {t}|C=0)}+(1-w)log frac{p(x_text {p}, x_text {d}|C=1)}{p(x_text {p}, x_text {d}|C=0)}. end{aligned} end{aligned}$$

(16)

The observer reports “same” when (d_text {tiny {LP}} > 0).

Response probabilities. Like in the EP model, we average over all possible combinations of (x_text {p}), (x_text {t}), and (x_text {d}) to calculate the response probabilities:

$$begin{aligned} begin{aligned} p({hat{C}} = 1|s_text {p}, s_text {t}, s_text {d})&= iiint pleft( {hat{C}} = 1|{x_text {p}, x_text {t}}, x_text {d}right) pleft( x_text {p}|s_text {p}right) pleft( x_text {t}|s_text {t}right) pleft( x_text {d}|s_text {d}right) d x_text {p} dx_text {t} d x_text {d}&= iiint limits _{left( x_text {p}, , x_text {d},, x_text {t}right) : , d_text {tiny {LP}} >0} pleft( x_text {p}|s_text {p}right) pleft( x_text {t}|s_text {t}right) pleft( x_text {d}|s_text {d}right) d x_text {p} dx_text {t} d x_text {d}. end{aligned} end{aligned}$$

(17)

We can re-write the decision variable in terms of (Delta _text {t}) and (Delta _text {d}) and evaluate the integral numerically on a grid.

Substitution model

Different from the pooling models, the Substitution (Sub) model asserts that the observer swaps the target for the distractor on some proportion of trials. The observer decides by choosing the C with the highest posterior probability that is computed based on either the measurements of (s_text {p}) and (s_text {t}), (pleft( C|x_text {p}, x_text {t}right)) or the measurements of (s_text {p}) and (s_text {d}), (pleft( C|x_text {p}, x_text {d}right)), on a single trial. We define the decision variable as the log posterior probability ratio based either on (x_text {p}) and (x_text {t}):

$$begin{aligned} begin{aligned} d_text {tiny {NI}}(x_text {p},x_text {t})&=log frac{pleft( C=1|x_text {p},x_text {t}right) }{pleft( C=0|x_text {p},x_text {t}right) }&= log frac{p(C = 1)}{p(C = 0)} + log frac{p(x_text {p}, x_text {t}|C=1)}{p(x_text {p}, x_text {t}|C=0)}, end{aligned} end{aligned}$$

(18)

or on (x_text {p}) and (x_text {d}):

$$begin{aligned} begin{aligned} d_text {tiny {NI}}(x_text {p},x_text {d})&=log frac{pleft( C=1|x_text {p},x_text {d}right) }{pleft( C=0|x_text {p},x_text {d}right) }&= log frac{p(C = 1)}{p(C = 0)} + log frac{p(x_text {p}, x_text {d}|C=1)}{p(x_text {p}, x_text {d}|C=0)}. end{aligned} end{aligned}$$

(19)

Response probabilities. We first compute the response probability that the observer reports “same” given (s_text {p}) and (s_text {t}) and given (s_text {p}) and (s_text {d}), separately. The resulting response probability is a weighted average of these two, given by:

$$begin{aligned} p({hat{C}} = 1|{s_text {p}},{s_text {d}},{s_text {t}}) = wp({hat{C}} = 1|{s_text {p}},{s_text {t}})+(1-w)p({hat{C}} = 1|{s_text {p}},{s_text {d}}), end{aligned}$$

(20)

where w and (1-w) here denote the probabilities of reporting “same” based on (p({hat{C}} = 1| s_text {p}, s_text {t})) and (p({hat{C}} = 1| s_text {p}, s_text {d})) (Eq. (4)), respectively. The final response probability depends only on (Delta _text {t} equiv s_text {p} – s_text {t}) and (Delta _text {d} equiv s_text {p} – s_text {d}), not on (s_text {p}), (s_text {d}) and (s_text {t}) individually (Eq. (5)).

Interference models with multiple distractors

We have so far derived interference models that only consider one distractor, which we refer to as one-distractor models. However, with the exception of the EP model, all other models can be readily adapted to account for interference from multiple distractors. In the 2-back task, both the 1-back and the 3-back stimuli from the sequence could contribute to interference. In the presence of two distractors, the EPn model assumes that the observer only has access to a weighted average measurement of the target and both distractors. The LP model assumes that the observer averages with different weights the decision variables derived from the target, the 1-back stimulus, and the 3-back stimulus separately. The Sub model, meanwhile, posits that the observer mistakenly identifies the target as either the 1-back or 3-back stimulus, with varying probabilities. Therefore, introducing a 3-back distractor only requires one additional parameter, representing its weight in the mixing or substitution process. (see Appendix E for details). For the EP model, accommodating two distractors implies discerning all potential measurements of the target, 1-back, and 3-back stimuli from a mixed measurement, a notion that is psychologically implausible. The same principles apply when extending the models to account for interference in cases where N exceeds 2, such as capturing interference induced by 1-back, 2-back, and potentially 4-back stimuli in a 3-back task.

Model fitting, model comparison, and parameter recovery

The NI model has three free parameters: a concentration parameter (kappa), a lapse rate (lambda), and a prior parameter (p_text {same}) in the decision variable. Apart from these three parameters shared with the NI model, each one-distractor interference model incorporates an additional parameter, (w_text {d1b}), reflecting the influence of the 1-back stimulus. Similarly, each two-distractor model introduces a parameter, (w_text {d3b}), to represent the impact of the 3-back stimulus.

All models were fitted to the data from individual subjects. We optimized the log likelihood (LL) of each model using Bayesian Adaptive Direct Search [BADS;65]. BADS alternates between a series of fast, local Bayesian optimization steps and a systematic, slower exploration of a mesh grid. We computed the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) from the LL to compare models, only drawing conclusions when both measures are consistent.

We demonstrated that the parameters of our models can be accurately retrieved, with as few as 200 trials providing robust parameter recovery (see Fig. A10 in Appendix I ).