There might be several needs to probabilistically determine if a single observation belongs to some population distribution. In some areas, such as neuropsychology, this methodology has been essential for the advancement of the field. This is because brain-damaged patients with patterns of associated cognitive functional impairments often are unique in each case. A single patient might then be the only available source of data for a phenomenon of interest. In addition, clinical diagnoses and description of patient impairments are always performed for single cases.
Patients with brain damage are hence often compared to a distribution of cognitive functioning in the healthy population, both in research and clinical contexts. We do however not always know the population parameters of such a distribution, and when we do not, these must be estimated from a sample – we compare the single case to a control sample. There are other potential application areas of this method such as studies of uncommon expertise, targeted quality checks in small production industries or some animal studies where rearing large experimental groups might be too costly or unethical.
However, since these methods most commonly are used within the field of neuropsychology the related nomenclature will be adopted. Abnormally low score on some variate compared to the control sample will be referred to as a deficit, important both in clinics and in research. For the latter another concept is also considered to be of great interest: namely an abnormally large discrepancy between two variates. This is known as a dissociation and provides evidence for independence between cognitive abilities. By mapping such discrepancies it is possible to build theories of the architecture of the human mind (Shallice 1988).
During the last 20 years methods allowing researchers to estimate abnormality and test for deficits and dissociations with retained control of Type I errors have been developed. This has mainly been done by John Crawford and Paul Garthwaite, e.g. Crawford and Howell (1998), Crawford, Garthwaite, and Ryan (2011), Crawford and Garthwaite (2007), Crawford and Garthwaite (2002) and Crawford and Garthwaite (2005). These methods have been implemented in freely available software. However, only as standalone programs, not open source and only for Windows. Most of the programs require manual input of summary statistics and cannot be used on raw data. Full documentation of the implemented methods are only available in the original publications.
This was the motivation behind the development of the R
package singcar
(Rittmo and McIntosh
2021). We wanted to encourage and simplify the usage of these
methods by bringing them together in a fully documented package with
open source code that works across platforms. In Section @ref(section2)
all implemented methods will be described in detail. This contribution
will provide a comprehensive overview of the methods available together
with their advantages and disadvantages.
The development of Crawford and Garthwaite’s methods has been focused around limiting Type I errors. But in recent work we showed the inherent inability of controlling Type II errors using this experimental design and argued that this limitation might even be more detrimental than an increase Type I error rate, for some applications of the methods (McIntosh and Rittmo 2020). Therefore we also provide power calculators for each test function in the package.
When we conduct single case comparisons we want to estimate the probability that a random draw from the distribution estimated by the sample is more extreme than the single observation we are interested in. This means that the power of a case comparison hypothesis test is contingent on the standard deviation of the distribution, not on the standard error of the test statistic. Increasing the sample size then (which usually is done to increase power) only leads to a better estimation of the control distribution, not a reduction of the the standard error. Hence, the limit of achievable power is mostly contingent on the size of the effect. When investigating unique cases it is not always possible to target larger effects to get rid of this problem.
One approach that often is highlighted for its potential to increase power is multivariate hypothesis testing. This is however only true to a certain extent (Frane 2015) and one should not force a multivariate analysis on a univariate research question. Focus hitherto has been univariate in nature, but for a neuropsychological dissociation to be convincing the optimal experimental design would include multiple tasks converging on the cognitive functions of interest, performed on more than one occasion (Shallice 1988; McIntosh and Rittmo 2020). Integrating such information both across tasks as well as over several performances could be aided by multivariate analyses.
This calls for the need of methods that estimate a case’s abnormality
in a multidimensional space estimated by a sample. The most obvious way
to do this would of course entail the p value of a Hotelling’s T2-score. However, Elfadaly, Garthwaite, and Crawford (2016) show
that this statistic is substantially biased as a measure of abnormality.
They investigate nine other candidates including two point estimates
based on polynomial expansion, which gave rise to the lowest bias. These
point estimates would, however, sometimes yield estimates out of the [0,
1] range and their interpretation is not very intuitive. Because ease of
interpretation is paramount in scientific reporting Elfadaly, Garthwaite, and Crawford (2016)
suggested using an adapted median estimator when bias is not of
“immediate importance”, in which case the polynomial estimators should
be used. A compiled computer program for this adapted median estimator
is available (Garthwaite,
Elfadaly, and Crawford 2017), but again, no open source
software working across platforms. This relatively novel estimation
method has also been implemented in singcar
.
The following section, Section @ref(section2), will review available
methods for single case comparisons, their theory and how they can be
used in R
. Section @ref(power) gives a brief review of
statistical power in this experimental design and the implementation of
power calculators in singcar
, but for a more thorough and
perhaps more accessible review see McIntosh and
Rittmo (2020).
Variates of interest will in a neuropsychological context often be scores on some task assessing a cognitive function. Therefore they will often be referred to as “task scores”. It has not been uncommon in this field to compare patients to a control sample, treating the sample statistics as population parameters and estimating deficits by evaluating the p value associated with the patient’s Z score from the estimated distribution. Estimating dissociations was done in a similar way using a method developed by Payne and Jones (1957), where a Z score is calculated from the distribution of difference scores in the control sample.
This is of course problematic if small samples are used, which is often the case in neuropsychology, since the sampling distribution of the sample variance is right skewed for small sample sizes. If one derives a test statistic that is divided by a skewed distribution the size of the obtained values would be inflated and thus the Type I error rate (Crawford and Howell 1998).
In the following sections I will describe methods that has been devised as replacements to the Z score methods. Both frequentist and Bayesian significance tests have been developed and are covered in Section @ref(sec22) and Section @ref(sec23) respectively. More complex model designs can be achieved using Bayesian regression techniques, described in Section @ref(cov), or linear mixed models, described in Section @ref(lmm). For estimating multivariate abnormality, see Section @ref(section4).
In the following sections the theory behind the methods in
singcar
is reviewed. But at the end of each subsection the
usage of the implementations will be exemplified. To aid this the
package comes with the dataset size_weight_illusion
, which
is a dataset from a neuropsychological study investigating the
size-weight illusion in a well known patient “DF” (Hassan et al.
2020). DF incurred damage to the lateral occipital complex
which gave rise to visual form agnosia, potentially making her less
susceptible to the size-weight illusion (Dijkerman et al.
2004). This illusion is a perceptual phenomenon making
objects of small size be perceived as heavier during lifting than
objects of larger size but equal in weight (Buckingham
2014).
However, due to the location of DF’s brain damage one would only suspect less susceptibility to the illusion for visual cues since she other sensory processing should be unaffected of her damage. In the study by Hassan et al. (2020) the illusion is tested given visual cues in one condition and kinaesthetic cues in another. It was predicted that she would be abnormally less susceptible to visual size-weight illusion and that there would be an abnormally large discrepancy between visual and kinaesthetic size-weight illusion. The control sample consisted of 28 participants and their age and sex was collected along with their performance in the two conditions. The size-weight illusion is given in the dataset as a scaled measure of the number of grams weight difference perceived per cubic cm of volume change. To use this data, start by installing and loading the package:
## GROUP PPT SEX YRS V_SWI K_SWI
## 1 SC DF Female 65 0.02814921 0.10012712
## 2 HC E01 Female 70 0.21692634 0.21792930
## 3 HC E02 Male 64 0.17223226 0.26338899
## 4 HC E03 Female 63 0.07138049 0.09331700
## 5 HC E04 Female 65 0.10186453 0.25938045
## 6 HC E05 Female 66 0.15911439 0.08922615
V_SWI
and K_SWI
are the measurements for
visual and kinaesthetic size-weight illusion respectively. Most
functions in singcar
can take summary data as input, but
for demonstrative purposes the raw data of this dataset will be used. To
illustrate usage of the methods more generally, the scores of the
variates of interest as well as of the covariate YRS
is
extracted from the data and renamed:
caseA <- size_weight_illusion[1, "V_SWI"]
contA <- size_weight_illusion[-1, "V_SWI"]
caseB <- size_weight_illusion[1, "K_SWI"]
contB <- size_weight_illusion[-1, "K_SWI"]
caseAGE <- size_weight_illusion[1, "YRS"]
contAGE <- size_weight_illusion[-1, "YRS"]
Here, caseA
and caseB
refers to the case
scores on some task A and B respectively. Similarly, contA
and contB
refers to the scores of the control sample on the
same tasks.
When we are comparing a patient to a healthy population on some normally distributed variate that we are estimating from a small sample, we are in fact not estimating abnormality from a normal distribution but a central t distribution.
Hence, we must use a t score rather than a Z score if we want to estimate deficits without inflating effect sizes. This was noted by Sokal and Rohlf (1981) (p. 227) where they devised a test statistic for single cases in the biological sciences, but it was first popularised within neuropsychology by Crawford and Howell (1998). Its common application is as a one-tailed “test of deficit” (TD) that estimate the proportion of the healthy population that would exhibit a lower score than the case on some cognitive task. If this score falls below some chosen threshold the patient can be diagnosed with a deficit.
This basic approach is simply a modified two samples t test,
where one of the samples are treated as a size of n = 1. The degrees of freedom then quite naturally become n + 1 − 2 = n − 1. $\overline{y}, \ s$ and n are the sample mean, standard deviation and size and y* is the case score. This method allows transparent control over the Type I error rate (Crawford, Garthwaite, and Howell 2009; Crawford et al. 2004; Crawford and Garthwaite 2012) when used for significance testing. In addition, the p value associated with he obtained t statistic gives an unbiased point estimate of the abnormality of the case. That is, it provides us with the expected percentage of the normal population that would exhibit a more extreme score than the case. This estimate is often of main interest in neuropsychology. The simple proof for this is demonstrated by Crawford and Garthwaite (2006) as outlined below.
The percentage of a population expected to score lower on some variate Y than the case y* is ℙ[Y < y*]. Subtracting $\overline{y}$ from both sides of the inequality and dividing by $s \sqrt{\frac{n + 1}{n}}$ gives $$ \mathbb{P}[Y< y^*] =\mathbb{P}\left[\frac{Y-\overline{y}}{s \sqrt{\frac{n + 1}{n}}} < \frac{y^* - \overline{y}}{s \sqrt{\frac{n + 1}{n}}}\right]. $$ The quantity to the left of the inequality, i.e., $\frac{y-\overline{y}}{s \sqrt{\frac{n + 1}{n}}}$ is t distributed with n − 1 degrees of freedom. Hence, $$ \mathbb{P}[Y< y^*] =\mathbb{P}\left[t_{n-1} < \frac{y^* - \overline{y}}{s \sqrt{\frac{n + 1}{n}}}\right]. $$ To the right of the inequality we have the test statistic from (@ref(eq:TD)). Therefore, ℙ[y < y*] is the p value from the test of deficit. This simple fact also makes the construction of confidence intervals for the abnormality (i.e. p) possible.
Although the Z score is not an appropriate statistic to use for significance testing when the control sample is small, it does provide a standardised effect measure of the deficit of interest similar to Cohen’s d, because insensitivity to sample size is a requirement for effect size measures (Cohen 1988; Crawford, Garthwaite, and Porter 2010). So, Crawford, Garthwaite, and Porter (2010) proposed the use of Z scores as effect size measures within single-case neuropsychology and dubbed this measure ZCC where the subscript indicates “case-controls” comparison: If p is the percentage of the population that would fall below or above a case score, ZCC can be used to construct 100(1 − α)% confidence intervals around p. The method to do so was described by Crawford and Garthwaite (2002).
Given ZCC and $y^* \neq \overline{y}$, ZCC comes from a non-central t distribution on n − 1 degrees of freedom. We find the value δU being the non-centrality parameter (NCP) of a non-central t distribution with df = n − 1 such that the $100\frac{\alpha}{2}$ percentile equates $Z_{CC}\sqrt{n}$ and correspondingly we find the NCP δL of a non-central t distribution such that the $100(1-\frac{\alpha}{2})$ percentile equates $Z_{CC}\sqrt{n}$. The easiest approach to finding these quantities is by deploying a search algorithm. We can then construct the upper and lower boundaries for ZCC as: $$ Z_{CC_U} = \frac{\delta_U}{\sqrt{n}}, \ \ Z_{CC_L} = \frac{\delta_L}{\sqrt{n}}. $$ When the case falls on the left side of the distribution the boundaries for p are given by: $$ p_U = \Phi\left(\frac{\delta_U}{\sqrt{n}}\right), \ p_L = \Phi\left(\frac{\delta_L}{\sqrt{n}}\right), $$ where Φ is the CDF of the standard normal distribution. And when the case falls on the right side of the distribution the boundaries are given by: $$ p_U = 1 - \Phi\left(\frac{\delta_L}{\sqrt{n}}\right), \ p_L = 1 - \Phi\left(\frac{\delta_U}{\sqrt{n}}\right). $$
Evidently, estimating abnormality on a single variate is trivial. It is equally simple to estimate abnormality on the difference between two variates Y1 − Y2 if the normally distributed variates Y1 and Y2 do not need standardisation to be comparable. If this is the case then the test function is similar to (@ref(eq:TD)) but one divides the differences by the standard deviation of the difference scores: where ρ12 is the correlation between the variates. Confidence intervals for this “unstandardised difference test” (UDT) is constructed as for the TD. Applications testing unstandardised differences is however scarce, at least within neuropsychology. Much more common is the need to asses abnormality of differences between variates that require standardisation to be comparable.
If we just standardise the variates in (@ref(eq:UDT)) and do not take sample size into account, we get an effect size measure of the difference (dissociation) similar to ZCC, (@ref(eq:zcc)): The two quantities in the numerator are the standardised case scores on variate Y1 and Y2 and the subscript indicates “discrepancy-case-controls”. This measure was proposed as a reporting standard in the single-case literature by Crawford, Garthwaite, and Porter (2010), but was first suggested as a statistic for significance testing by Payne and Jones (1957). It is of course not appropriate for such use since we would get an inflated resultant statistic (ZDCC) because the quantities in the numerator (z1* and z2*) are inflated.
When we only can estimate standardised case scores from small samples and need to test if the difference between them is abnormally large we are in fact estimating the difference between two t distributed variates. Since linear combinations of correlated t distributed variates are not t distributed themselves this is not a trivial problem.
Garthwaite and Crawford (2004) examined the distribution of such difference scores using asymptotic expansion. They searched for quantities that divided by a function of the sample correlation would be asymptotically t distributed. They found:
ρ being the sample correlation and c a pre-specified critical two-tailed t value on df = n − 1. They showed that ℙ[ψ > c] ≈ ℙ[t > c] and that one must solve for ψ = c to obtain a precise probability of ψ which yields a quantity that is not dependent on a specified critical value. This is a quadratic equation in c2, choosing the positive root of which yields:
where p = ℙ[tn − 1 > c] is used for significance testing and is the point estimate the percentage of the control population that is expected to show a more extreme difference score than the case. Note that the test statistic c will always be positive, no matter the direction of the effect.
Crawford and Garthwaite (2005) refer to this test statistic as the “revised standardised difference test” (RSDT) because it was an iteration on a test previously developed by the authors, that did not keep control of the Type I error rate Crawford, Howell, and Garthwaite (1998). Crawford and Garthwaite (2005) show that the RSDT succeeds in this attempt fairly well.
Using simulations the RSDT was shown to barely exceed the specified error rate even for very small samples such as n = 5. However, when a case lacks a dissociation but exhibits extreme scores on both variates (in the same direction of course) the error rate increases steeply. According to one simulation study the RSDT starts to lose control of the error rate for scores that are simultaneously more extreme than two standard deviations away from the mean and for task scores as extreme as 8 standard deviations away from the mean, error rates as high as 35% were observed Crawford and Garthwaite (2007).
Another issue with the RSDT is that, because ψ is only approximately t distributed, constructing confidence intervals for estimated abnormality in the same manner as for the TD is not possible. So to remedy these drawbacks of this test statistic Crawford and colleagues looked into Bayesian methodology.
singcar
The test of deficit is called in the singcar
package
with the function TD()
. Testing for a deficit using the
example data from Section @ref(exdata) we have:
##
## Test of Deficit
##
## data: Case = 0.03, Controls (m = 0.16, sd = 0.08, n = 28)
## t = -1.7243, df = 27, p-value = 0.04804
## alternative hypothesis: true diff. between case and controls is less than 0
## sample estimates:
## Std. case score (Z-CC), 95% CI [-2.34, -1.15]
## -1.754857
## Proportion below case (%), 95% CI [0.95, 12.47]
## 4.804003
Where the argument alternative
specifies the alternative
hypothesis, i.e. whether we are performing two or one sided test and if
the latter, which direction. If one instead wants to test for a
dissociation the most common alternative hypothesis is two sided. Using
the RSDT for this test, call:
RSDT(case_a = caseA, case_b = caseB, controls_a = contA, controls_b = contB,
alternative = "two.sided")
##
## Revised Standardised Difference Test
##
## data: Case A: 0.03, B: 0.10, Ctrl. A (m, sd): (0.16, 0.08), B: (0.18, 0.10)
## approx. abs. t = 1.015, df = 27, p-value = 0.3191
## alternative hypothesis: true difference between tasks is not equal to 0
## sample estimates:
## Std. case score, task A (Z-CC) Std. case score, task B (Z-CC)
## -1.7548574 -0.7836956
## Std. task discrepancy (Z-DCC) Proportion below case (%)
## -1.0647889 15.9560625
Notably, this function does not produce confidence intervals. If the
two tasks are directly comparable without the need of standardisation
(which in fact is the case for this specific dataset), call instead
UDT()
with the same syntax and confidence intervals will be
given.
The most prominent difference between the frequentist and the Bayesian framework is that in the former parameters are treated as fixed attributes of a population and in the latter they are themselves treated as random variables with associated distributions. To estimate these one can use prior knowledge about the parameter of interest which is specified as a prior distribution. There is often a desire that the data should be the only thing influencing estimations. If so, one can reduce the impact of the prior by, for example, assigning equal probabilities to all possible parameter values. This is an example of what is known as a non-informative prior distribution. This distribution is then updated when new information is obtained from data and as such forms the posterior parameter distribution. This is calculated using Bayes theorem and if we omit the normalising constant in the denominator of Bayes theorem it can be expressed:
If we have a hypothesis for a parameter value, what the above is saying is that the posterior density of this hypothesis is proportional (∝) to the likelihood of the data under that hypothesis multiplied by the prior probability of the hypothesis.
One of the disadvantages with Bayesian methodology is that it might be impossible or at least very difficult to analytically calculate the posterior. They are however often feasible to construct with numerical solutions using Monte Carlo methods. One of the reasons Bayesian statistics has received such upsurge is because the increase of computational power in recent years that has made many otherwise infeasible numerical strategies possible. Monte Carlo methods are mathematical algorithms that solve numerical problems by repeated random sampling. The algorithms vary depending on the problem at hand, but for Bayesian methodology they are all building on rules for drawing random values based on the likelihood and prior – as such “constructing” the posterior with the draws after a large number of iterations. The peak of the constructed distribution is often the parameter of interest and the width is a measurement of the certainty of the estimation. If using non-informative priors the peak often coincide with the maximum likelihood estimation of the parameter. This means that non-informative priors often yields estimators with frequentist properties, necessary for null hypothesis significance testing.
Because the testing property of the estimation methods presented here
are of main interest, frequentist properties was required when Crawford and Garthwaite (2007) and Crawford, Garthwaite, and Ryan (2011) devised
Bayesian analogues to the test functions presented in Section
@ref(sec22). The procedural details of these methods will be described
in the following sections. This is to provide details of how they were
implemented in singcar
as well as an overview of how they
operate. The methods are all based on Monte Carlo methods and some are,
computationally, very intense. They are implemented in
singcar
as straight R code but would probably have
benefited from being implemented in compiled code first. This is an aim
for future updates of the package. The following algorithms presented
closely follows the original articles, but with few slight changes of
notation to make the presentation more coherent.
The Bayesian analogue of the test of deficit (BTD) produces asymptotically identical output as the frequentist test of deficit. It produces an estimate of p with accompanying credible intervals, i.e. the percentage of controls that would exhibit a more extreme score than the case. The method is included here for completeness but for actual analysis there is no advantage to using it over the TD, (@ref(eq:TD)).
From a sample of size n we obtain measurements of some normally distributed variate Y with unknown mean μ and unknown variance σ2. Let $\overline{y}$ and s2 denote the sample mean and variance and y* the case score. The prior distribution of μ is conditioned upon the prior distribution of σ2 since both parameters are unknown. The prior considered here is f(μ, σ2) ∝ (σ2)−1, which is the standard non-informative prior for normal data. The marginal posterior distribution of $\frac{(n-1)s^2}{\sigma^2} \sim \chi^2_{n-1}$ and the posterior distribution $\mu|\sigma^2 \sim \mathcal{N}(\overline{y}, \sigma^2/n)$, see e.g., Gelman et al. (2013) (p. 45 and 65).
When we have a posterior distribution for the control population parameters, p can be estimated by iterating the following steps.
Iterating these steps a large number of times yields a distribution of p̂, the mean of which is the point estimate of p, which can be used for significance testing. The 2.5th and 97.5th percentile of the distribution of p̂ as well as of z* gives the boundaries for the 95% credible interval of p̂ and ẐCC respectively. The point estimate of the latter is that of (@ref(eq:zcc)).
In contrast to the BTD, the Bayesian analogue of the difference tests
does not produce asymptotically identical results to the RSDT.
For the Bayesian standardised difference test (BSDT) we now obtain
measurements on Y1
and Y2 following a
bivariate normal distribution representing two tasks of interest, from a
control sample of size n. The
goal is to estimate the percentage p of the control population that
would be expected to show a greater difference Y1 − Y2
than the case.
Let $\overline{y}_1$ and $\overline{y}_2$ be the sample means
and
the sample variance-covariance matrix. Then S = A(n − 1)
is the sums of squares and cross products (SSCP) matrix. The case scores
are denoted y1* and y2*.
The population parameters are both unknown. To estimate p we must first obtain a posterior of these parameters. Since both are unknown the prior distribution of μ is again conditioned upon the prior distribution of Σ.
The prior considered for f(μ, Σ−1) in Crawford and Garthwaite (2007) was f(μ, Σ−1) ∝ |Σ|. This is a common non-informative prior, however, it was chosen in favour of the perhaps even more commonly used f(μ, Σ−1) ∝ |Σ|(k + 1)/2 for multivariate normal data, where k= the number of variates. This was because the former prior was shown to give asymptotically identical interval estimates as the UDT for unstandardised case scores. Even though the latter perhaps is more commonly applied, Crawford and Garthwaite (2007) refer to the former as the “standard theory” prior.
For this prior we have that Σ−1 ∼ 𝒲k(S−1, n). That is, the posterior marginal distribution for Σ−1 is a Wishart distribution parametrised with scale matrix S−1 and n degrees of freedom. The Wishart distribution is a multivariate generalisation of the χ2 distribution. It is possible to view S ∼ 𝒲p(Σ, n) as a distribution of SSCP-matrices associated with Σ. Inversely, one can view the inverse Wishart distribution parametrised with some SSCP matrix and n degrees of freedom as a distribution of variance-covariance matrices related to that SSCP matrix.
We then have that $\pmb\mu|\Sigma \sim \mathcal{N}_k(\pmb{\overline{y}}, \Sigma/n)$ where $\mathcal{N}_k(\pmb{\overline{y}}, \Sigma/n)$ denotes a multivariate normal distribution with mean $\pmb{\overline{y}}$ and variance Σ/n, see e.g., Gelman et al. (2013) (p. 72-73).
The posterior marginal distribution of Σ is then constructed by iterating draws of random observations from 𝒲−1(S, n), each observation, Σ̂(i), being the estimation of Σ for the ith iteration.
As previously mentioned, frequentist properties are desired if the estimations are to be used for significance testing. Berger and Sun (2008) investigated convergence to frequentist estimates for bivariate normal distributions using various priors. They showed that the “standard theory” prior produced converging estimates of σ1 and σ2 but that the convergence was not as good for ρ and differences in means. As a “general purpose” prior they instead recommend using $f(\pmb\mu, \Sigma^{-1}) \propto \frac{1}{\sigma_1\sigma_2(1-\rho^2)}$. Posteriors with this prior is constructed by using rejection sampling. Random draws from an inverse Wishart on n − 1 degrees of freedom are generated, but only a subset of the the observations are retained.
Crawford, Garthwaite, and Ryan (2011) considered this prior but noticed that it gave rise to too narrow credible intervals, overestimating the certainty of the estimations. However, they showed that if the sample was treated as being of size n − 1 when estimating Σ, i.e. so that the generated intervals were somewhat more conservative, frequentist properties were retained. This “calibrated” prior is recommended by Crawford, Garthwaite, and Ryan (2011), and they refer to it as such.
To construct the posterior using the calibrated prior the following steps are iterated:
Since we are reducing the sample size we must put S* = (n − 2)S/(n − 1) to retain the unbiased property of S/(n − 1) as an estimate of Σ.
Draw a random observation from 𝒲−1(S*, n − 2). This draw is denoted: $$ \hat{\Sigma} = \begin{bmatrix} \hat{\sigma}^2_{1} & \hat{\sigma}_{12} \\ \hat{\sigma}_{12} & \hat{\sigma}^2_{2} \end{bmatrix} $$ and $$ \hat{\rho}= \frac{\hat{\sigma}_{12}}{\sqrt{\hat{\sigma}^2_{1}\hat{\sigma}^2_{2}}}. $$
If u is a random draw from a uniform distribution u ∼ U(0, 1), Σ̂ is only accepted as the estimation of Σ if u2 ≤ 1 − ρ̂2, if not we iterate the procedure until we have an accepted draw of Σ̂. Once accepted we have an estimation of Σ for this iteration and denote the draw:
Even though this latter calibrated prior is recommended, some might argue that frequentist properties are of no concern and therefore choose the “standard theory” prior. But regardless of the prior chosen, when we have an estimate of Σ, the following steps are iterated to obtain an estimate of p, the percentage of controls expected to exhibit a more extreme difference between the variates than the case.
Generate two draws from a standard normal distribution and denote them zr1 and zr2. Find the lower triangular matrix T such that TT′ = Σ̂(i) using Choleksy decomposition. The estimate of μ for this iteration is then: See for example Gelman et al. (2013) (p. 580).
We now have estimates of both μ and Σ and calculate p as if these were the true
population parameters. The method to do so differ slightly depending on
whether a standardised or unstandardised test is required. For an
unstandardised test, the size of the difference score is calculated
as:
More commonly we would want to calculate the difference score from
standardised variates. If this is the case, put: and
The estimate of p, p̂(i), for this iteration is then the tail area of a standard normal distribution more extreme than z(i)*.
Iterate the procedure a large number of times and save the estimations for each iteration. This will yield a distribution of p̂ and ẑ*, the quantiles of which again are used to calculate the boundaries of the credible intervals for p and ZDCC. The point estimate of the former is the mean of the estimated distribution, while the point estimate of the latter is that of (@ref(eq:PJ)).
singcar
Testing for a deficit using the Bayesian test of deficit, call
BTD()
:
##
## Bayesian Test of Deficit
##
## data: Case = 0.03, Controls (m = 0.16, sd = 0.08, n = 28)
## df = 27, p-value = 0.04787
## alternative hypothesis: true diff. between case and controls is less than 0
## sample estimates:
## Std. case score (Z-CC), 95% CI [-2.35, -1.15]
## -1.754857
## Proportion below case (%), 95% CI [0.94, 12.57]
## 4.787144
As can be seen, compared to TD()
this function has the
additional argument iter
, which indicates the number of
iterations to be used for the posterior approximation. This number
should be based on desired accuracy of the analysis.
Testing for a dissociation using the BSDT (recommended over RSDT), call:
BSDT(case_a = caseA, case_b = caseB, controls_a = contA, controls_b = contB,
alternative = "two.sided", iter = 10000, unstandardised = FALSE,
calibrated = TRUE)
##
## Bayesian Standardised Difference Test
##
## data: Case A: 0.03, B: 0.10, Ctrl. A (m, sd): (0.16,0.08), B: (0.18,0.10)
## df = 26, p-value = 0.3212
## alternative hypothesis: true difference between tasks is not equal to 0
## sample estimates:
## Std. case score, task A (Z-CC)
## -1.7548574
## Std. case score, task B (Z-CC)
## -0.7836956
## Std. discrepancy (Z-DCC), 95% CI [-1.69, -0.42]
## -1.0647889
## Proportion below case (%), 95% CI [4.60, 33.67]
## 16.0600481
This function has several additional arguments compared to
RSDT()
. Instead of having a separate function for an
unstandardised Bayesian difference test one can specify whether to
standardise the variates or not within the function. If
unstandardised
is set to TRUE
, the results
from BSDT()
will converge to that of UDT()
. To
use the “standard theory” prior instead of the calibrated, set
calibrated
to FALSE
.
It is seldom possible to design perfect experiments. This is especially true in psychological science where causal relationships can be confounded by a great variety of factors. When comparing a case to a control sample it is therefore important to reduce noise by matching the control group to the case as well as possible on variates beyond the ones of interest, such as age or education level, making the collection of control samples cumbersome.
However, this matching of covariates can instead be achieved statistically. Crawford, Garthwaite, and Ryan (2011) describe methods where they extend the tests presented in @ref(sec22) to allow for the inclusion of covariates using Bayesian regression. These methods thus allow you to compare the case’s score on the task(s) of interest against the control population conditioned on them having the same score as the case on the covariate. This both reduces noise and alleviate the collection of control data. But because one degree of freedom is lost for each included covariate it is advisable to only include covariates with a correlation to the variate(s) of interest larger than 0.3, as a rule of thumb (Crawford, Garthwaite, and Ryan 2011).
We assume that the variates of interest are normally distributed, but no assumption of the distribution of the covariates is made. The procedure to get an estimate of p in the presence of covariates is presented below for the general case, that is either when we are interested in a deficit or a dissociation. The main difference between the methods is the recommended prior.
From a sample of size n we obtain measurements on k = 1, 2 variates of interest an m number of covariates, denoted X = (X1, …, Xm).
We wish to estimate the regression coefficients for each covariate on the variate(s) of interest: $$ \boldsymbol{B} = \begin{bmatrix} \boldsymbol{\beta}_1 & \cdots & \boldsymbol{\beta}_k \end{bmatrix}. $$ Here B is an m + 1 × 1, 2 matrix where each column contains the regression coefficients for the ith variate of interest, the first row being the intercept(s). Further, we wish to estimate the covariance matrix or variance for the variate(s) of interest conditioned upon the covariates, we assume these statistics not to vary with the covariates. $$ \Sigma \ | \ \boldsymbol{X} = \begin{bmatrix} \sigma^2_1 \ | \ \boldsymbol{X} & \rho\sigma_1\sigma_2 \ | \ \boldsymbol{X} \\ \rho\sigma_1\sigma_2 \ | \ \boldsymbol{X} & \sigma^2_2 \ | \ \boldsymbol{X} \end{bmatrix} \ \text{or} \ \left[\sigma^2 | \boldsymbol{X} \right]. $$
If X is the n × (m + 1) design matrix and Y, the n × k response matrix $$ \boldsymbol{X} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{1m} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n1} & \cdots & x_{nm} \end{bmatrix},\ \ \boldsymbol{Y} = \begin{bmatrix} y_{11} & \cdots & y_{1k} \\ \vdots & \ddots & \vdots \\ y_{n1} & \cdots & y_{nk} \end{bmatrix} $$ the data estimates of B and Σ are $$ \boldsymbol{B}^* = (\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{Y} \ \text{and} \ \Sigma^* = \frac{1}{n-m-1}(\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{B}^*)'(\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{B}^*). $$
When k = 1 or the “standard theory” prior is used for k = 2 the posterior of Σ is an inverse Wishart with scale matrix (n − m − 1)Σ* on n − m − 1 or n − m degrees of freedom for k = 1 and k = 2 respectively (Crawford, Garthwaite, and Ryan 2011; Tiao and Zellner 1964). Hence, to get an estimate Σ̂(i) of Σ we generate a draw from these distributions.
For the “calibrated” prior we instead follow the procedure outlined in Section @ref(BSDT) and use S* = (n − m − 2)S/(n − m − 1) as scale matrix to sample from an inverse Wishart on n − m − 2 degrees of freedom, S = (n − m − 1)Σ*. The estimates are then:
k=2.
(#eq:sigma) \end{equation}
We turn the (m + 1) × k matrix B* into a k(m + 1) × 1 vector by concatenating the columns in B*. Bvec* = (β1*′, ..., βk*′)′ is the estimate of Bvec. The posterior of Bvec|Σ̂(i) is a multivariate normal distribution with mean Bvec* and covariance matrix Λ(i) = Σ̂(i) ⊗ (X′X)−1, where ⊗ denotes the Kronecker product. To get an estimate of $\hat{\boldsymbol{B}}_{(i)}$ we generate a random observation from this distribution where the k(m + 1) × 1 vector of obtained values $\hat{\boldsymbol{B}}^*_{\text{vec(i)}} =(\hat{\boldsymbol{\beta}}'_{1(i)}, ...,\hat{\boldsymbol{\beta}}'_{k(i)})'$ is turned back into the m + 1 × k matrix $\hat{\boldsymbol{B}}_{(i)} = (\hat{\boldsymbol{\beta}}_{1(i)}, ...,\hat{\boldsymbol{\beta}}_{k(i)})$. If x* is a vector of the case values on the covariates we obtain estimates of the conditional case scores on the task of interest as so: $$ \hat{\boldsymbol{\mu}}_{(i)} = \hat{\boldsymbol{B}}_{(i)}\boldsymbol{x}^*. $$
With these values we can now estimate the conditional effect size of the case. Crawford, Garthwaite, and Ryan (2011) denote these effect measures ZCCC and ZDCCC depending on whether k = 1 or k = 2. As the names indicate they are related to ZCC (@ref(eq:zcc)) and ZDCC (@ref(eq:PJ)), but calculated from statistics conditioned upon the covariates, hence the extra C in the subscript.
If y1* is the case score on a variate Y1, we would calculate the conditional deficit (ZDCCC) of the case on variate Y1 like so: If y2* is the case score on variate Y2 and we are interested in a dissociation between Y1 and Y2, we denote the two conditional means on these variates μ̂1(i) and μ̂2(i). ZDCCC is then calculated like so: The conditional standard deviations ŝ1(i) and ŝ2(i) are obtained from Σ̂(i) in (@ref(eq:sigma)), the conditional correlation between the variates (ρ̂12(i)) is given by $$\hat{\rho}_{12(i)} = \frac{\hat{s}_{12(i)}}{\sqrt{\hat{s}^2_{1(i)}\hat{s}^2_{2(i)}}}.$$
To get estimates of p we find the tail area under the standard normal distribution with more extreme values on ẐCCC(i) or ẐDCCC(i), depending on the problem and alternative hypothesis specified. If testing for a defict we would for example have: p̂(i) = Φ(ẐCCC(i)).
Iterate the procedure a large number of times and save the estimations for each iteration. The mean of the resultant distribution of p̂ is taken at the point estimate of p. Point estimates of ZCCC and ZDCCC are obtained by using (@ref(eq:zccc)) and (@ref(eq:zdccc)) with the conditional means, standard deviations and correlation calculated directly from the control sample. The 1 − α credible interval boundaries for each estimate is obtained from the quantiles of the estimated distributions as described for the previous methods.
singcar
To specify a Bayesian test of deficit with one or more covariates, call:
BTD_cov(case_task = caseA, case_covar = caseAGE, control_task = contA,
control_covar = contAGE, alternative = "less", int_level = 0.95,
iter = 10000, use_sumstats = FALSE)
##
## Bayesian Test of Deficit with Covariates
##
## data: Case = 0.03, Controls (m = 0.16, sd = 0.08, n = 28)
## df = 26, p-value = 0.05202
## alternative hypothesis: true diff. between case and controls is less than 0
## sample estimates:
## Std. case score (Z-CCC), 95% CI [-2.31, -1.11]
## -1.749556
## Proportion below case (%), 95% CI [1.05, 13.40]
## 5.202005
The covariates should for the case be specified as a single value or vector of values containing the case scores for each covariate included. For the controls the covariates should be specified as a vector or matrix where each column represents one of the covariates.
To specify a Bayesian test for a dissociation with one or more covariates present, call:
BSDT_cov(case_tasks = c(caseA, caseB), case_covar = caseAGE,
control_tasks = cbind(contA, contB), control_covar = contAGE,
alternative = "two.sided", int_level = 0.95, iter = 10000,
calibrated = TRUE, use_sumstats = FALSE)
##
## Bayesian Standardised Difference Test with Covariates
##
## data: Case A = 0.03, B = 0.10, Ctrl. (m, sd) A: (0.16,0.08), B: (0.18,0.10)
## df = 25, p-value = 0.33
## alternative hypothesis: true difference between tasks is not equal to 0
## sample estimates:
## Std. case score, task A (Z-CC)
## -1.754857
## Std. case score, task B (Z-CC)
## -0.783696
## Std. discrepancy (Z-DCCC), 95% CI [-1.67, -0.41]
## -1.064152
## Proportion below case (%), 95% CI [4.79, 34.18]
## 16.500000
Here, the case scores as well as the control scores from variates of
interest must be concatenated. The covariates are specified in the same
manner as for BTD_cov()
. The prior used is again specified
with the argument calibrated
.
If using summary statistics for these functions, the argument
use_sumstats
must be set to TRUE
. Information
on how to specify the data in such a case consult the documentation.
It should be noted that the TD of course is nothing other than a linear regression model with a dummy coded variable for belonging to the patient group. This equivalence might seem trivial but is included here for completeness. If y = (y1*, y2, ..., yn) is the vector of scores for the case y1* and the control sample (y2, ..., yn) on the variate of interest, then x = (1, 0, ..., 0) is a dummy coded variable indicating inclusion in the patient group. The equation for a regression model with these variables is then: yi = β0 + β1xi + ϵi, where the error term ϵi ∼ 𝒩(0, σ2). The least squares estimate of β1, β̂1 equals the numerator in the equation for the TD, (@ref(eq:TD)). We have: Here $\overline{y}_c$ denotes the sample mean of the controls, which is also the least squares estimate of the intercept, β̂0:
To test the slope in a simple regression model one uses the statistic following a t distribution on n − 2 degrees of freedom, given that the null hypothesis is true. The degrees of freedom for this distribution is of course the same as nc − 1 degrees of freedom if nc is the size of the control sample. Remember that the denominator in the TD was $s_c \sqrt{\frac{n_c + 1}{n_c}}$, where sc is the standard deviation of the control sample. We know that the standard error for the slope coefficient is: $$ SE\left(\hat{\beta}_1\right) = \sqrt{\frac{\frac{1}{n-2}\sum^n_{i =1}\hat{\epsilon}^2_i}{\sum^2_{i=1}(x_i-\overline{x})^2} } = \sqrt{\frac{\frac{1}{n-2}\sum^n_{i=1}(y_i-\hat{y}_i)^2}{\frac{n-1}{n}}}. $$ Furthermore, ŷi = β̂0 + β̂1xi and $\hat{y}_1 = \overline{y}_c+(y^*_1-\overline{y}_c) = y^*_1$, hence: As such the t test for the slope in the regression equation and the test of deficit are identical. To get ZCC if using the regression approach simply divide β̂1 by the standard deviation of the sample, ZCC = β̂1/sc.
The fact that we can use simple linear regression instead of the TD to model case abnormality is in itself not very interesting. However, one can include covariates without going through the trouble of performing Bayesian regression for the methods as described in Section @ref(cov). To compute the conditional effect size ZCCC in this case one must first calculate the conditional mean and standard deviation from the control sample. The conditional mean can be predicted from the fitted model given the case values on the covariates and the conditional variance can be computed like so: $$ \sigma^2|X = \frac{\sum^n_{i=2}\epsilon_i^2}{n-2}, $$ where X denotes the covariates and ϵi the residuals from the fitted model. Note also that the index starts at 2 to leave out the case. ZCCC is then: $$ Z_{CCC} = \frac{y_1^*-\mu|X}{\sqrt{\sigma^2|X}}. $$
More importantly though, we can generalise the linear regression to linear mixed effect models (West, Welch, and Galecki 2014). With linear mixed effect models (LMM) one can model data with dependency between observations, opening for the possibility to use repeated measures data in single case comparisons and other, more complex, designs. This is because LMM can model both fixed and random effects (hence “mixed”). Fixed effects are in this case fixed or constant across participants and random effects are found for observations that are somehow more related to each other (e.g. several measurements from the same participant or hierarchically clustered observations). The concept of applying LMM to single case comparisons was first investigated by Huber et al. (2015).
Since impairments on tasks converging on a cognitive construct of interest provides stronger evidence of a potential deficit, the possibility to model random effects successfully is of great importance. Such a model would take the form: Here i indicates participant and j indicates task or item. So, yij is then the response of participant i for task/item j. β0 and β1 are still the intercept of the sample and raw effect of the case while v0i denotes the random intercept for each participant. As such the term allows for participants to vary from the average intercept of the sample. However, while the model in (@ref(eq:lmm1)) accounts for participant variability we must include yet a further term to account for the task/item variability: where w0j denotes the random intercept for tasks/items, which should be included to avoid inflation of Type I errors (Baayen, Davidson, and Bates 2008).
A common method to obtain p values for LMMs is to use likelihood ratio tests. However, Huber et al. (2015) show that the Type I error rate for this method only is at an acceptable level when the size of the control sample is larger than 50. In contrast, using parametric bootstrapping or one-tailed t tests approximating the degrees of freedom with the Satterthwaite method, kept the error rate at its nominal level (or rather somewhat above, implying that a slightly more conservative α value should be considered) even for small control sample sizes. As a reference they compared error rate and statistical power of the mixed model to the TD using aggregated data, i.e. taking the mean of all tasks/items for each participant, which is what Crawford and Howell (1998) suggests for this type of data. The p value of β1 in (@ref(eq:lmm2)) using the Satterthwaite method will be approximately equal to the two sided p value from the TD if averaging the scores for each participant over the tasks/items, given a large enough sample size.
Furthermore, because we can model data with dependency between observations with LMM it is also possible to model effects within participants, i.e. between conditions. The model equation would then be: where β2 is the raw effect of the condition indicated by the categorical variable x2. Introducing an interaction term between x1 and x2 in this model we can even test for the presence of a dissociation. This model would take the form: where β3 is the raw effect of the dissociation and v1i is the random slope of the participants. That is, v1i accounts for the variability of the effect of the condition, between participants. A caveat of using LMM is that it requires more observations than parameters specified, to be identifiable. Specifying random slopes and random intercepts for each participant then one must have > 2n observations. Implying that it would not be a good model for testing a dissociation between two tasks without repeated measurements within each task.
There are already several implementations of LMMs in various open
source software which can be used directly in the context of single case
comparisons as described above. Therefore this methodology has not been
implemented in singcar
. However, due to the usefulness of
the concept and the lack of methodology in singcar
to
handle repeated measures data I will give a brief description of how one
can specify an LMM in R
using the package lme4
(Bates et al.
2015) and lmerTest
(Kuznetsova, Brockhoff,
and Christensen 2017) (the latter is used to obtain p values from t tests with Satterthwaite
approximation of the degrees of freedom).
R
Begin by installing and loading required packages:
install.packages(c("lme4", "lmerTest", "MASS"))
library("lme4")
library("lmerTest")
library("MASS") # For multivariate simulation
Because the example data is not suited for repeated measures analysis a simulated dataset will be used. Say that we have 30 measurements from four normally distributed variates Yi, i = 1, 2, 3, 4. Call these variates items. They are thought to converge on some cognitive ability of interest and have the distribution: $$ \begin{pmatrix} Y_1 \\ Y_2 \\ Y_3 \\ Y_4 \end{pmatrix} \sim \mathcal{N}\left( \begin{bmatrix} 100 \\ 80 \\ 50 \\ 20 \end{bmatrix}, \begin{bmatrix} 15^2 & 108 & 78 & 30 \\ 108 & 10^2 & 12 & 15 \\ 78 & 12 & 10^2 & 25 \\ 30 & 15 & 25 & 5^2 \end{bmatrix} \right). $$ The case has incurred a deficit of two standard deviations on all four variates. These data could then be simulated as follows:
simdata <- expand.grid(case = c(1, rep(0, 29)), item = factor(1:4))
simdata$ID <- factor(rep(1:30, 4))
set.seed(123) # For replicability
mu <- c(100, 80, 50, 20)
sigma <- matrix(c(15^2, 108, 78, 30,
108, 10^2, 12, 15,
78, 12, 10^2, 25,
30, 15, 25, 5^2), nrow = 4, byrow = T)
dv <- mvrnorm(30, mu = mu, Sigma = sigma) # Dependent variable
dv[1, ] <- dv[1, ] + c(-30, -20, -20, -10) # Case scores
simdata$dv <- c(dv)
The linear mixed model to test the deficit of the case would then be
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 62.15587 17.512303 3.030078 3.549269 0.037506019
## case -25.07172 7.793911 27.999869 -3.216834 0.003263006
Where (1|ID)
and (1|item)
specifies the
random effects structure, in this case the random intercepts for
participant and item. This is approximately the same t statistic that would be produced
by the test of deficit if averaging the scores for each participant:
## t
## -3.21684
Now, say that we want to model a dissociation using LMM. We have collected 30 measurements from two items during two different conditions. It is thought that these two different conditions are tapping two independent cognitive abilities but that the items collected in each condition are converging on the same. Hence we have four variates Yij, i = 1, 2, j = 1, 2, distributed according to the multivariate distribution: where e.g. Y12 is the random variable corresponding to the first item in the second condition. Given that the two conditions in fact taps different cognitive structures, the incurred impairment of the case would potentially affect performance in one of the conditions more than the other. Hence, we impose a deficit of two standard deviations on the case scores. The data can then be simulated as follows:
simdata <- expand.grid(case = c(1, rep(0, 29)),
item = factor(1:2), condition = factor(1:2))
simdata$ID <- factor(rep(1:30, 4))
contrasts(simdata$condition) <- contr.sum(2) # Effect coding
set.seed(123) # For replicability
mu <- c(100, 80, 50, 20)
sigma <- matrix(c(15^2, 120, 45, 7,
120, 10^2, 11, 15,
45, 11, 10^2, 40,
7, 15, 40, 5^2), nrow = 4, byrow = T)
dv <- mvrnorm(30, mu = mu, Sigma = sigma) # Dependent variable
dv[1, ] <- dv[1, ] + c(-30, -20, -0, -0) # Case scores
simdata$dv <- c(dv)
The fully crossed linear mixed model used to test a dissociation for these data would then be specified like so:
model2 <- lmer(dv ~ case*condition + (condition|ID) + (1| item), data = simdata)
round(summary(model2)[["coefficients"]], 5)
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 61.87447 12.18065 1.02161 5.07973 0.11992
## case -17.24035 7.66804 28.00044 -2.24834 0.03261
## condition1 28.08855 0.98494 28.00011 28.51813 0.00000
## case:condition1 -13.82757 5.39472 28.00011 -2.56317 0.01603
Comparing the p value of
the interaction effect to the p value obtained from
RSDT()
we see that that they are far from equivalent.
agg_data1 <- rowMeans(dv[ , 1:2])
agg_data2 <- rowMeans(dv[ , 3:4])
RSDT(agg_data1[1], agg_data2[1], agg_data1[-1], agg_data2[-1])[["p.value"]]
## [1] 0.06737687
However, a small simulation study conducted with the data described above revealed that the RSDT and the LMM yields comparable results regarding power and Type I error. It takes the specified impairments of the case for each variate and returns the ratio of significant interaction effects from (@ref(eq:lmm4)) divided by the total number of simulations run, as well as the same ratio obtained from either the RSDT or the BSDT. The distribution of the data is the same in this small simulation study as (@ref(eq:lmmdist)).
R> powerLMM(nsim = 1000, case_impairment = c(-30, -20, -0, -0),
+ compare_against = "RSDT", alpha = 0.05)
RSDT LMM
1 0.353 0.462
It seems like the method based on the linear mixed model has quite a good power advantage over the RSDT for data simulated from this particular distribution. To compare the error rate of two tests when can use the same ratio as for power, when there is no true effect present. In this case that would be either when the case has no incurred impairment at all or when the impairments are equal in both conditions.
R> powerLMM(nsim = 1000, case_impairment = c(-0, -0, -0, -0),
+ compare_against = "RSDT", alpha = 0.05)
RSDT LMM
1 0.032 0.039
Even though a power advantage of the mixed model was observed it seems to produce an error rate under the nominal level of 0.05 as well. If this was a consistent behaviour it would eliminate the usefulness of the RSDT. However, if the case exhibits extreme impairments in both conditions we would have another null scenario. This was discussed previously in Section @ref(sec22) where it was noted that one of the reasons Crawford and Garthwaite (2007) developed a Bayesian approach to test for dissociations was the highly increasing error rate of the RSDT, for increasing deficits on both tasks with no discrepancy between them. Say that the case has impairments of six standard deviations on both items in both conditions and let us compare the LMM to the BSDT instead of the RSDT:
R> powerLMM(nsim = 1000, case_impairment = c(-90, -60, -60, -30),
+ compare_against = "BSDT", alpha = 0.05)
BSDT LMM
1 0.053 0.61
The BSDT seems to keep the error rate almost at its nominal level while it is enormous for the LMM. Even though a more extensive simulation study with various types of data structures should be performed to solidify these findings, the preliminary results shown here indicate some limitations of using LMM for dissociations and reiterates the recommendation of Crawford and Garthwaite (2007) to use the BSDT to test for dissociations, even when using aggregated repeated measures data. That is, because the effects of brain damage can be severe I would recommend against using LMM to model dissociations.
Up until now I have discussed univariate testing methods. However, as discussed in the introduction, Section @ref(section1), stronger evidence for either deficits or dissociations is provided by using multiple measures (tasks) for a cognitive function of interest. Conceptually, what we want to estimate or test in this scenario is the distance between a vector of observations for a single case and a vector of population means. A common way to do this is by using the Mahalanobis distance measure. If y* is a vector of task observations from a single case and μ is the population mean vector, the Mahalanobis distance, δ, is a measure of the distance between the case and the population mean, given by: $$ \delta= \sqrt{(\boldsymbol{y}^*-\boldsymbol{\mu})'\Sigma^{-1}(\boldsymbol{y}^*-\boldsymbol{\mu})}. $$
Given that the population follows a multivariate normal distribution we have that δ2 ∼ χν12 where ν1 is the degrees of freedom and the number of dimensions in the multivariate distribution. This squared distance is sometimes referred to as the Mahalanobis index. Because the purpose of case-comparison methodology is to estimate abnormality we are interested in p, that is the proportion of the control population that would exhibit larger δ than a single case. If we need to estimate μ and Σ the sample Mahalanobis distance is given by $$ \hat\delta = \sqrt{(\boldsymbol{y}^*-\overline{\boldsymbol{y}})'\boldsymbol{S}^{-1}(\boldsymbol{y}^*-\overline{\boldsymbol{y}})}, $$ where $\overline{\boldsymbol{y}}$ is the sample mean vector and S the sample covariance matrix. Using the Mahalanobis index between a case and the mean of a normative sample, the abnormality is estimated with $$ \hat{p} = \mathbb{P}[(\boldsymbol{y}-\overline{\boldsymbol{y}})'\boldsymbol{S}^{-1}(\boldsymbol{y}-\overline{\boldsymbol{y}}) > (\boldsymbol{y}^*-\overline{\boldsymbol{y}})'\boldsymbol{S}^{-1}(\boldsymbol{y}^*-\overline{\boldsymbol{y}})]. $$
An obvious way to estimate p would be to use the Hotelling’s T2 test. The T2 statistic is proportional to the F distribution and hence its related p value seems like an intuitive choice as an estimate of p. This was in fact proposed by Huizenga et al. (2007), but Elfadaly, Garthwaite, and Crawford (2016) show that this estimate is biased and devise instead several other candidates. None of these was shown to be perfectly unbiased but several of the estimates performed better than the estimate based on the T2 statistic.
If we let λ = δ2 = (y* − μ)′Σ−1(y* − μ) and χν12 = (y − μ)′Σ−1(y − μ) be the case’s Mahalanobis index and the Mahalanobis index of any vector x respectively, what we want to estimate is: p = ℙ[χν12 > λ]. If $\overline{\boldsymbol{y}}$ and $\hat\Sigma = \frac{n-1}{n}\boldsymbol{S}$ denotes the maximum likelihood estimates of μ and Σ the Hotelling’s T2 based estimate of p, p̂F, is $$ \hat{p}_F = \mathbb{P}\left[(\boldsymbol{y}-\overline{\boldsymbol{y}})'\hat\Sigma^{-1}(\boldsymbol{y}-\overline{\boldsymbol{y}})>(\boldsymbol{y}^*-\overline{\boldsymbol{y}})'\hat\Sigma^{-1}(\boldsymbol{y}^*-\overline{\boldsymbol{y}})\right]. $$ If we let $\lambda_0 = (\boldsymbol{y}^*-\overline{\boldsymbol{y}})'\boldsymbol{S}^{-1}(\boldsymbol{y}^*-\overline{\boldsymbol{y}})$, we have that $$ T^2=\frac{n\lambda_0}{n+1} \ \text{and} \ \hat{p}_F = \mathbb{P}\left[F_{\nu_1, \nu_2}>\frac{\nu_2}{\nu_1(n-1)}T^2\right]. $$ i.e., this estimate is the p value associated with testing the hypothesis that the case is coming from the control population, using a Hotelling’s T2 test. As such it is also often used as a point estimate of p. However, we can also use the fact that λ ∼ χν12 and use the p value from the χ2 distribution by only using the sample estimates on the right hand side of the inequality: $$ \hat{p}_{\chi^2} = \mathbb{P}\left[(\boldsymbol{y}-\boldsymbol{\mu})'\Sigma^{-1}(\boldsymbol{y}-\boldsymbol{\mu})>(\boldsymbol{y}^*-\overline{\boldsymbol{y}})'\hat\Sigma^{-1}(\boldsymbol{y}^*-\overline{\boldsymbol{y}})\right]. $$ We have that: $$ \hat{p}_{\chi^2} = \mathbb{P}\left[\chi^2_{\nu_1}>\frac{n}{n-1}\lambda_0\right]. $$ This is an intuitive estimator and it was shown to exhibit fairly low bias (lower than p̂F) for low true values of p and it had the smallest mean square error of all estimator considered by Elfadaly, Garthwaite, and Crawford (2016). But unfortunately, as p increases p̂χ2 starts to show substantial bias.
Both of theses estimates are intuitive, however, since they both are
biased Elfadaly, Garthwaite, and Crawford (2016)
recommend using other estimators, based on the needs of the analysis.
One of which is based on the a modified confidence interval for
Mahalanobis distance and two are based on quadrature polynomial
approximation of the χ2 distribution. In
Section @ref(pd) I describe two estimators based on confidence intervals
of δ which are both
implemented in singcar
along with p̂F and p̂χ2.
Reiser (2001) suggested a method to construct confidence intervals on δ2. Using this method Elfadaly, Garthwaite, and Crawford (2016) propose a way to construct confidence intervals on p as well. We have: $$ F_0 = \frac{n\nu_2}{(n-1)\nu_1}\lambda_0 \sim F_{\nu_1, \nu_2}(n\lambda), $$ that is, the sample statistic F0 has a non-central F distribution on ν1 and ν2 degrees of freedom and non-centrality parameter nλ. We call the value of nλ, where F0 is the α-quantile of this distribution, Lα. We can construct the lower and upper boundaries of a confidence interval for p using the CDF for the χ2-distribution on ν1 degrees of freedom, denoted G(.), as such:
Similarly, they suggest a point estimator of p based on the median of Fν1, ν2(nλ). So that if we let L0.5 be the value of nλ where F0 is the median of Fν1, ν2(nλ) the estimator is:
Similarly we would construct a confidence interval around λ by choosing λu and λl such that
However, this method has a drawback. It exactly matches the nominal level, i.e. all of the say 95% intervals (assuming α = 0.05) will in fact contain the true value of δ 95% of the time, proved by Wunderlich et al. (2015). But when F0 is small enough so that $\mathbb{P}[F_{\nu_1,n- \nu_1}(0)\leq F_0] \leq 1- \frac{\alpha}{2}$, we cannot find a λu so that $\mathbb{P}\left[F_{\nu_1,n- \nu_1}(n\lambda_u)\leq F_0\right] =1- \frac{\alpha}{2}$ and it is instead set to 0. Similarly we set λl to 0 when $\mathbb{P}[F_{\nu_1,n- \nu_1}(0)\leq F_0] \leq \frac{\alpha}{2}$. But we know with full certainty that an interval of [0, 0] will not contain λ, since y* is continuous with ℙ[y* = μ] = 0.
This problem follows in the construction of confidence intervals for p, (@ref(eq:DCI)) as well as for the estimator p̂D, (@ref(eq:pmed)). The median of the distribution Fν1, ν2(nλ) decreases of course as the non-centrality parameter nλ decreases. We have that λ ≥ 0, therefore the lower limit of the median for Fν1, ν2(nλ) is the median of of this distribution when the non-centrality parameter nλ = 0, i.e. the median of a central F distribution on ν1 and ν2 degrees of freedom. So, using the approach of Reiser (2001), if F0 is smaller than this limit one set nλ to zero.
We see, however, a bias in this estimator when F0 is small even if it is not smaller than the lower limit. Elfadaly, Garthwaite, and Crawford (2016) gives as an example of this scenario where we have ν1 = 4, ν2 = 20 and λ0 = 0.4, hence n = 24 and we have $F_0 = \frac{24* 20}{23 * 4}0.4 = 2.09$. The value of the non-centrality parameter nλ where the median of Fν1, ν2(nλ) equals 2.09 is ~5.015 and p̂D = 1 − G4(5.015/24) = 0.9949 or 99.49%. So when using the estimated Mahalanobis index of 0.4 we would infer that only 0.51% of the population would exhibit a lower Mahalanobis index. If instead calculating the true percentage of the population that would exhibit a smaller Mahalanobis index when 0.4 is the true case value, that is p = 1 − G4(0.4) = 0.9825 or 98.25%, which is quite a considerable discrepancy.
Garthwaite, Elfadaly, and Crawford (2017) proposed a solution to this in the construction of confidence intervals of δ by modifying the method by Reiser (2001). They suggested to form a Bayesian credible interval instead of the frequentist confidence interval, whenever F0 is small. Credible intervals have the more intuitive interpretation that they contain the true value of interest with a probability of 1 − α. As such this interval will not give rise to unreasonable values such as [0, 0]. But to form a credible interval, a prior must be specified.
Let y be a vector of values from an observation of the control population and not necessarily the case. We denote the prior for δ when δ is the Mahalanobis distance of the case as π(δ) and when δ is the distance between the mean of the controls and a randomly selected observation as ψ(δ). A priori δ should be larger when it is calculated from the observation of the case than a randomly selected observation of the controls. Garthwaite, Elfadaly, and Crawford (2017) assume that for any δ̂2 and any specified quantile πq(δ|δ̂) ≥ ψq(δ|δ̂), where πq(.) and ψq(.) are the q:th quantile of the prior distributions.
Note that they do not aim to specify π(δ), instead they want to put limits on the quantiles of the resultant posterior through ψ(δ). This is because we know neither π(δ) nor πq(δ|δ̂) but it is possible to establish ψ(δ|δ̂) and the α/2 and 1 − α/2 quantiles of this distribution are then the upper and lower boundaries of a 1 − α credible interval of δ.
The logic behind this modification is intuitive: treat the case like a randomly drawn observation from the control distribution whenever the case observation is more similar to the average of the controls than the majority of the control observations. That is, we take the boundaries that are lowest of the confidence and credible intervals.
Again, we set λ = δ2 and λ̂ = δ̂2 and let ψ*(λ) be the prior corresponding to ψ(δ). This is a χ2 distribution on ν1 degrees of freedom. If we set m = n/(n − 1) the likelihood of λ, L(λ; λ̂), is given by Combining the prior with the likelihood, ψ*(λ)L(λ; λ̂) yields the posterior ψ*(λ|λ̂):
Here B(., .) and Γ(.) are the beta and gamma function respectively, c is the norming constant which is obtained through numeric integration of @ref(eq:post) over 0 < λ < ∞.
Sums over infinite ranges are problematic for practical computation.
One way do deal with this is to set reasonable endpoints instead of
infinity. For the implementation in singcar
the endpoint of
the infinite sum was set by noting that the denominator in
(@ref(eq:post)) goes to infinity faster than the numerator. In fact, in
R
and many other languages r! = ∞ for r > 170, so a reasonable boundary
for the infinite sum would then be 170,
because anything divided by infinity is 0. However, for some parameter
values infinity will be approached in the numerator before it is
approached in the denominator. Infinity divided by any real value is
still infinity but infinity divided by infinity is undefined. Therefore,
to be able to sum, we remove any infinite or undefined value if present.
As such, the integral over the sum in (@ref(eq:post)) will always be
convergent and the normalising constant c can be calculated.
singcar
uses the function integrate
in the
base R
package stats
to do this. This
numerical integration procedure, based on routines from the QUADPACK
package (Piessens et al. 2012), often
yields correct integrals. However, it uses subdivision of the interval
to be integrated and when integrating to infinity the area of interest
will be undersampled. Hence, in singcar
we set: $$
c = \int_0^{0.5\sqrt{\hat\lambda}}\psi^*(\lambda |
\hat{\lambda}) d\lambda +
\int_{0.5\sqrt{\hat\lambda}}^{\sqrt{\hat\lambda}}\psi^*(\lambda |
\hat{\lambda}) d\lambda +
\int_{\sqrt{\hat\lambda}}^{1.5\sqrt{\hat\lambda}}\psi^*(\lambda |
\hat{\lambda}) d\lambda +
\int_{1.5\sqrt{\hat\lambda}}^{2\sqrt{\hat\lambda}}\psi^*(\lambda |
\hat{\lambda}) d\lambda +
\int_{2\sqrt{\hat\lambda}}^{\infty}\psi^*(\lambda | \hat{\lambda}) \
d\lambda.
$$ This ensures heavier sampling around $\sqrt{\hat\lambda}$, which of course is the
area we are most interested in. Further, for large values of λ̂ some discontinuities will be
introduced in the posterior which can be problematic for some numerical
integration techniques as well but the above subdivision will alleviate
this problem somewhat.
The q:th quantile of the posterior ψ*(λ|λ̂),
λq is then
found by an optimisation search procedure (nlminb
in
stats
) that aims to minimise |∫λ0λqψ*(λ|λ̂)dλ − q|.
To get an estimate of p we
use the same procedure to find the median of the posterior, λ0.5. Elfadaly, Garthwaite, and Crawford (2016)’s
modified estimator of p, p̂MD, is
then: p̂MD = min[p̂D, 1 − G(λ0.5)].
Note however that due to the fact that c will sometimes be very small and
hence 1/c very large,
computations of the normalised posterior can be very slow, but this
mostly happens when δ̂ is so
large that p̂D would be
greater than 1 − G(λ0.5)
anyway. In singcar
it is therefore recommended to use p̂D but to
calculate p̂MD if
the Mahalanobis distance of the case seems suspiciously small.
The difference between the estimators p̂D and p̂MD is small and Elfadaly, Garthwaite, and Crawford (2016) show that their bias and means square error are almost identical, but they recommend p̂MD. This is because, as δ̂ → 0, p̂D → 1 much faster than it, by common sense, should.
It should be noted that the implementation of p̂MD in
singcar
is unstable when compared to the implementation by
Garthwaite, Elfadaly, and Crawford (2017)
written in C
. I have not been able to find the cause of
this discrepancy, but it only happens for relatively large δ̂. It is likely that the numerical
integration procedures differ and that the procedure used in the
implementation by Garthwaite, Elfadaly, and
Crawford (2017)
better handles discontinuous functions, which we see in the posterior
(@ref(eq:post)) for high values of δ̂. Alternatively, the
discontinuities cause problems for the optimisation routine. It is
therefore recommended that both estimators are calculated, and the
smallest of the two are chosen, but if that is p̂MD
then compare the result to the implementation by Garthwaite, Elfadaly, and Crawford (2017).
singcar
Say that we want to know the case abnormality in the example data on both tasks simultaneously. To estimate this abnormality on a multivariate space, call:
MTD(case = c(caseA, caseB), controls = cbind(contA, contB),
conf_level = 0.95, method = c("pd", "pchi", "pf", "pmd"),
mahalanobis_dist = NULL, k = NULL, n = NULL)
As can be seen, p̂D is the
default method to estimate the abnormality. This is because the
difference in results between p̂D and p̂MD is
very small, but the difference in efficiency is huge. In addition,
"pmd"
in singcar
is actually 1 − G(λ0.5)
rather than min[p̂D, 1 − G(λ0.5)],
hence taking the minimum is left to the user. The reason for this was to
minimise confusion. However, I realise that this could yield the
opposite effect and might be changed in future updates.
The p value in the output
of MTD()
is the p
value associated with the Hotelling’s T2 statistic and the same
as the abnormality estimate if the method pf
is chosen. The
three last arguments are only required if summary statistics are
used.
A further capacity of singcar
is that it can be used to
calculate statistical power of the tests. The notion of power when
comparing cases to control samples have been somewhat overlooked for
this class of statistical tests. In recent work (McIntosh and Rittmo
2020), we argued that, even though power is inherently
limited in this paradigm, a priori calculations are still useful for
study design and interpretation in neuropsychological and other
applications. Calculating power for the test of deficit is similar to
calculating power for any t
test and can be done analytically. Where Tn − 1(.|θ)
is the cumulative distribution function for the non-central t distribution with n − 1 degrees of freedom and
non-centrality parameter $\frac{y^* -
\overline{y}}{\sigma \sqrt{\frac{n+1}{n}}}$ (i.e., TD, Equation
@ref(eq:TD) and tα, n − 1
is the α quantile of the t distribution on n − 1 degrees of freedom (note that
this is for a one-sided test). For the unstandardised difference test
power is calculated in an analogous way by putting Equation @ref(eq:UDT)
as the non-centrality parameter. Deriving power for the other functions
in an analytic manner is however not possible (the RSDT is only
approximately t distributed)
and a Monte Carlo approach has been used for these tests. To call any
power calculator in the package one simply uses the function names with
_power
added as a suffix.
So, for example, to calculate power for the test of deficit we call
TD_power()
. The expected case score and either sample size
or desired power must be supplied. The mean and standard deviation of
the control sample can also be specified with the arguments
mean
and sd
. If not, they take the default
values of 0 and 1 respectively so that the case score is interpreted as
distance from the mean in standard deviations. A conventional α-level of 0.05 is assumed if nothing else is supplied.
The alternative hypothesis can also be specified by the argument
alternative
: specify "less"
(default) or
"greater"
for a one-tailed test, specify
"two.sided"
for a two-tailed test.
TD_power(case = 70,
mean = 100,
sd = 15,
sample_size = 16,
power = NULL,
alternative = "less",
alpha = 0.05,
spec = 0.005)
## [1] 0.5819579
TD_power()
can also be used to calculate required sample
size for a desired level of power. For example, if we specify a desired
power level of 0.6, leave sample_size
to its default and
let the rest of the arguments be as in the previous example, we see from
the output of the function that power will not increase more than 0.5%
for any additional participant after a sample size of 15. That is, the
algorithm stops searching when this level of specificity has been
reached and we are nearing the asymptotic maximum power for this effect
size. We can increase the specificity by lowering the spec
argument.
TD_power(case = 70,
mean = 100,
sd = 15,
sample_size = NULL,
power = 0.6,
alternative = "less",
alpha = 0.05,
spec = 0.005)
## Power (0.578) will not increase more than 0.5% per participant for n > 15
## n power
## 1 15 0.5780555
Power calculators for the Bayesian tests of deficit cannot calculate
required sample size. This is because they rely on simulation methods to
estimate approximate power and deploying a search algorithm to find the
required sample size for a given level of power would be computationally
too intense. The syntax is otherwise relatively similar to that of
TD_power()
. For BTD_power()
we have the two
extra arguments nsim
and iter
, indicating the
number of simulations used in the power function and by
BTD()
, respectively.
BTD_power(case = 70,
mean = 100,
sd = 15,
sample_size = 15,
alternative = "less",
alpha = 0.05,
nsim = 1000,
iter = 1000)
## [1] 0.574
The only difference in syntax of BTD_cov_power()
is due
to the inclusion of covariates. The variate of interest must be
specified as a vector where the first element gives the mean and the
second the standard deviation in the argument control_task
.
The covariates can be specified similarly or as an m × 2 matrix where the first column
gives the means of each covariate and the second column gives the
standard deviations. The correlation matrix of the variates must be
given as well. In the example below, power is evaluated for a test
taking two covariates into account, both with a mean of 0 and a standard
deviation of 1. The correlation is specified as a 3 × 3 matrix with pairwise correlations of
0.3. The default settings include only
one covariate having a 0.3 correlation
with the variate of interest. This function is computationally intense
and hence, the number of simulations has, for the example below, been
decreased.
covars <- matrix(c(0, 1,
0, 1), ncol = 2, byrow = TRUE)
BTD_cov_power(case = -2,
case_cov = c(0.2, -0.6),
control_task = c(0, 1),
control_covar = covars,
cor_mat = diag(3) + 0.3 - diag(c(0.3, 0.3, 0.3)),
sample_size = 15,
alternative = "less",
alpha = 0.05,
nsim = 100,
iter = 100)
## [1] 0.48
For the difference tests one must supply the expected case scores on
both variates as well as sample size. The means and standard deviations
of the control sample can also be specified. If unspecified, they take
on the default values of 0 and 1 respectively, so that the expected case
scores are interpreted as distances from the means in standard
deviations. RSDT_power()
, BSDT_power()
and
UDT_power()
additionally require an estimate of the sample
correlation between the variates of interest, r_ab
. If this
is not specified a correlation of 0.5 is assumed by default. For
BSDT_cov_power()
the correlation matrix between the
variates of interest and the covariates must instead be supplied (i.e.,
at least a 3 × 3 matrix where the first
correlation is the correlation between the variates of interest).
The alternative hypothesis is by default assumed to be
"two.sided"
since the direction of the effect is dependent
on the order of the inputs, but can be specified to be
"less"
or "greater"
as well. The syntax is
similar for all three functions but with small differences. For
UDT_power()
one can request required sample size for a
desired power, as for TD_power()
. Calculators for the
Bayesian tests have the extra argument calibrated
as to be
able to specify the prior. BSDT_cov_power()
requires input
in the same format as BTD_cov_power()
for both
control_tasks
and control_covar
. The two
examples below demonstrate usage for RSDT_power()
and
BSDT_cov_power()
.
RSDT_power(case_a = 70,
case_b = 55,
mean_a = 100,
mean_b = 50,
sd_a = 15,
sd_b = 10,
r_ab = 0.5,
sample_size = 15,
alternative = "two.sided",
alpha = 0.05,
nsim = 1000)
## [1] 0.604
cor_mat <- matrix(c(1, 0.5, 0.6,
0.5, 1, 0.3,
0.6, 0.3, 1), ncol = 3, byrow = TRUE)
BSDT_cov_power(case_tasks = c(70, 55),
case_cov = 65,
control_tasks = matrix(c(100, 15,
50, 10), ncol = 2, byrow = TRUE),
control_covar = c(50, 25),
cor_mat = cor_mat,
sample_size = 15,
alternative = "two.sided",
alpha = 0.05,
nsim = 100,
iter = 100,
calibrated = TRUE)
## [1] 0.75
This vignette has reviewed several statistical methods to compare a
single case to a small control sample. It has exemplified practical
usage of these methods with implementations in the package, where
possible, and in lme4
where not. Using repeated measures
data and linear mixed models has been shown to potentially produce
higher power when testing for a discrepancy between two variates than
the more standard RSDT, in a very small simulation study. However, when
no discrepancy is present but the case exhibits severe impairments on
both variates the mixed model yielded an extremely high error rate, for
the specific scenario investigated. So when estimating dissociations it
is recommended to use some of the Bayesian tests and aggregating the
data.
Since more complex model structures can be created with linear mixed models compared to the more standard procedures it is unfortunate that their usage for finding dissociations seems limited. This is on the other hand somewhat made up for by the implementation of the Bayesian regression techniques in Section @ref(cov). In addition, implementation of (relatively) unbiased techniques to estimate case abnormality in multidimensional space, Section @ref(section4), also offers the possibility of increased complexity.
The methods described have been developed for keeping transparent control over Type I errors, but power calculators have been implemented in the package as well. Consideration of power can assist researchers in study design and in setting realistic expectations for what these types of statistical hypothesis tests can achieve (McIntosh and Rittmo 2020).
Code for the function powerLMM()
which evaluates power
for the linear mixed model used to test for the presence of a
dissociation and compares it to the power for either the RSDT or the
BSDT, with data distributed as described in Section
@ref(lmmexample).
powerLMM <- function(nsim, case_impairment = c(0, 0, 0, 0),
compare_against = c("RSDT", "BSDT"), alpha = 0.05) {
require(lme4)
require(lmerTest)
require(MASS)
comptest <- match.arg(compare_against)
pdt <- vector(length = nsim)
plm <- vector(length = nsim)
simdata <- expand.grid(case = c(1, rep(0, 29)),
item = factor(1:2), condition = factor(1:2))
simdata$ID <- factor(rep(1:30, 4))
contrasts(simdata$condition) <- contr.sum(2)
mu <- c(100, 80, 50, 20)
sigma <- matrix(c(15^2, 120, 45, 7,
120, 10^2, 11, 15,
45, 11, 10^2, 40,
7, 15, 40, 5^2), nrow = 4, byrow = T)
for (i in 1:nsim){
dv <- mvrnorm(30, mu = mu, Sigma = sigma)
dv[1, ] <- dv[1, ] + case_impairment
simdata$dv <- c(dv)
model2 <- lmer(dv ~ case*condition + (condition|ID) + (1| item), data = simdata)
agg_data1 <- rowMeans(dv[ ,1:2])
agg_data2 <- rowMeans(dv[ ,3:4])
if (comptest == "RSDT"){
pdt[i] <- RSDT(agg_data1[1], agg_data2[1],
agg_data1[-1], agg_data2[-1])[["p.value"]]
} else {
pdt[i] <- BSDT(agg_data1[1], agg_data2[1],
agg_data1[-1], agg_data2[-1])[["p.value"]]
}
plm[i] <- summary(model2)[["coefficients"]][4, 5]
}
if (comptest == "RSDT"){
data.frame(RSDT = sum(pdt < 0.05)/nsim, LMM = sum(plm < 0.05)/nsim)
} else {
data.frame(BSDT = sum(pdt < 0.05)/nsim, LMM = sum(plm < 0.05)/nsim)
}
}