Quantal Response Analysis — Example

John Maindonald, Statistics Research Associates

24 July 2017

Conventions

Data

Hawaian contemporary data, supplied by Peter Follett, will be used for demonstrating the use of functions in the qra package. Several different styles of model will be compared. This turns out to be a challenging dataset with which to work.

Check fits to species/lifestage combinations

We start by using GLM models to check out how well lines fit to the individual species/lifestage and species/lifestage/replicate combinations.

Set up data

Plot data

The graphs now shown use a logistic transform for the \(y\)-scale. Responses appear acceptably linear, after the first one or two observations.

For what follows, observations will, where at least 7 times with < 100% mortality are available, be restricted to days 6 or later. Where at least 5 times with < 100% mortality are available, be restricted to days 4 or later. The points that remain then appear, apart from clear outliers, to be acceptably linear on a logit mortality scale.

Alternatives that in principle might be used are:

  1. Fit curves, rather than a line. This complicates calculation of confidence intervals for LT values, requiring the development of new code.
  2. Fit a zero-inflated model. I have not to date found any R package that accommodates such models, with a binomial or quasibimomial error. There are packages that may be worth investigating – this becomes a research exercise. The function fieller() in this package could be fairly straightforwardedly adapted to handle this case.

Alternative 1 will handle a wider range of cases than 2, which models the very specific form of nonlinearity that results from Abbott’s formula type control mortality effects.

Both these alternatives require the estimation of additional parameters. For models that fit curves, a strategy is required for deciding on the family of curves that will be used, and on how the change of curve between treatment groups will be parameterized. For zero-inflated models, a strategy is required for deciding on whether zero-inflation parameters can be assumed common across some treatment groups.

Even where sample-based control mortality estimates are available, these will in general be too inaccurate to use as known, fixed, control mortalities.

Diagnostic plots

The following shows diagnotic plots, after fitting one line for each species/lifestage/replicate combination. Fitted lines are very clearly different between replicates. As replicates are treated as fixed effects, the models fitted here are not suitable for generalization beyond the data used to fit the model. The models will be used for diagnostic purposes.

We wish to check:

Two types of model will be fitted — a generalised linear (GLM) model, and linear model with \(y = \log(\dfrac{Dead+1/6}{Live+1/6})\).

Warning: not plotting observations with leverage one:
  70, 71, 76, 77
Diagnostic plots --- GLM X model

Diagnostic plots — GLM X model

The “Residuals vs Fitted” plot suggests that a systematic pattern of variation may remain after fitting the line. Note, however, that the fitted smooth may be misleading, given some large outliers and the strong indication in the “Scale-Location” plot that the GLM model is giving too much weight to points at midrange mortalities, and too little weight to high mortality points. The assumption of a constant dispersion is clearly seriously wrong. One answer to the changes in dispersion may be to adjust the GLM weightings.

(The scale-location plot will be close to a horizontal line if points are being correctly weighted. It shows reduced variation about the line as mortalities increase.)

GLM model with prior weights

We investigate adjusting the weights by reversing or partially reversing the effect of the weighting that is implicit in the use of a generalized linear model with quasibinomial errors.

Warning: not plotting observations with leverage one:
  70, 71, 76, 77
GLM model, logit link, adjusted weights

GLM model, logit link, adjusted weights

Now omit point 52 and repeat the plots:

Warning: not plotting observations with leverage one:
  37, 38, 69, 70, 75, 76
GLM model, logit link, adjusted weights, omitting points 51 and 52

GLM model, logit link, adjusted weights, omitting points 51 and 52

The “Residuals vs Fitted” plot is not as flat as one would like. The span for the smooth may however be set too small for this dataset.

Linear model with \(y = \log(\dfrac{Dead+1/6}{Live+1/6})\)

Again, we fit one line per replicate, using a robust fit in order to downweight the influence of outliers.

Warning: not plotting observations with leverage one:
  70, 71, 76, 77
lm model; y = log((Dead+1/6)/(Live+1/6))

lm model; y = log((Dead+1/6)/(Live+1/6))

Several points are identified as outliers. The following checks the diagnostic plots that result when they are omitted:

Warning: not plotting observations with leverage one:
  37, 38, 39, 40, 66, 67, 72, 73
lm model; y = log((Dead+1/6)/(Live+1/6))

lm model; y = log((Dead+1/6)/(Live+1/6))

Mixed model analysis

In the mixed model context, the main effect of differences in the way that individual lines are fitted is in the efficiency of use of the data.

Fit model using rlmer

We fit robust versions of the linear mixed model. This allows, however, for a random intercept only. The model that allowed also for a random slope generated an error. (A guess is that there were too few points to allow a satisfactory fit.)

Loading required package: lme4
Loading required package: Matrix
Note: method with signature 'CsparseMatrix#Matrix#missing#replValue' chosen for function '[<-',
 target signature 'dgCMatrix#ngCMatrix#missing#numeric'.
 "Matrix#nsparseMatrix#missing#replValue" would also be valid
Note: method with signature 'CsparseMatrix#Matrix#missing#replValue' chosen for function '[<-',
 target signature 'dgCMatrix#nsCMatrix#missing#numeric'.
 "Matrix#nsparseMatrix#missing#replValue" would also be valid

Diagnostic plots from robust linear mixed model

The largest random effect is a clear outlier.

Fit model using glmer

The following applies the relative weightings that were used for the fit to trtGpRep with adjusted weights:

Lethal Time Estimates — Comparison

The comparison is for X models, with slopes that vary between species/lifestage combinations. Models work either with cTime (centered version of TrtTime), or with scTime (centered and scaled).

  1. A robust linear mixed model, fitted to \(\log(\dfrac{Dead+1/6}{Alive+1/6})\). The model has a random intercept only. The information in the data was not sufficient to allow for the fitting of a random slope component of variance.
  2. A GLMM with no adjustment for varying dispersion
    • Model modX.glmer; work with scTime
  3. A GLMM with weight correction
    • Model modXW.glmer; work with scTime
  4. A GLMM with no adjustment for varying dispersion, random intercept only
    • Model modXi.glmer; work with scTime
  5. A GLMM with weight correction, random intercept only
    • Model modXWi.glmer; work with scTime
  6. A GLM model that ignores fixed (and random) replicate specific effects, with adjustment for varying dispersion (this is included largely to warn against its use!)
    • If there are large differences between replicates, this may lead to a very flat line, as with this dataset for Medfly eggs.

Commentary

My preference, following on from my experience in working with disinfestation data in the past year, is Model 1. In work that I undertook prior to around 2000, the choice was to work with some equivalent of Model 2. The preference may depend somewhat on the individual dataset. Additionally, it may be that we were not at that time very practiced with the use of diagnostic plots.

A further possibility is to fit one LT value for each replicate, then using those as the basis for further analysis. Results may be unsatisfactory if the accuracy of the LT99 estimates varies widely between replicates.

LT99 estimates and CIs: Medfly eggs

The following assume 16 degrees of freedom for the variance-covariance matrix. There are 8 \(\times\) 3 replicates in all. For each species/lifestage combination, two of the three degrees of freedom are left over after estimating the species/lifestage specific intercept, i.e., there are 8 \(\times\) 2 = 16 degrees of freedom. The same is the case for the (random) estimate. The value 16 is then the smallest of the degrees of freedom for the sources of variability that contribute to the variance-covariance matrix.

The confidence interval calculations have not taken account of the omission of outliers in the analyses, or of the use of robust methods in the analyses. The confidence intervals are on this account likely to be anti-conservative. In principle, one can bootstrap the calculations, but a likely roadblock for the present data is that calculations will fail for at least some of the bootstrap samples.

The following are LT99 and approximate confidence intervals:

MedFlyEgg: 
                                    estval    var lower upper
lmm (rlmer), random intercept        10.02  1.167  7.92  12.8
glmer, uncorrected wts                8.89  1.855  6.69  13.5
glmer, corrected weights              9.38  1.445  7.31  12.9
glmer, uncorr wts, random intercept  10.19  0.740  8.42  12.1
glmer, corr wts, random intercept     9.98  0.897  8.15  12.3
GLM: Corr weights                    14.83 12.250 10.55  48.1

MedFlyL1: 
                                    estval   var lower upper
lmm (rlmer), random intercept         6.11 2.208  2.75 10.73
glmer, uncorrected wts                5.36 1.356  3.32  9.75
glmer, corrected weights              5.63 1.597  3.48 10.82
glmer, uncorr wts, random intercept   5.36 0.905  3.51  7.78
glmer, corr wts, random intercept     5.63 1.383  3.53  9.61
GLM: Corr weights                     5.63 2.950    NA    NA

MedFlyL2: 
                                    estval   var lower upper
lmm (rlmer), random intercept         7.58 0.425  6.27  9.14
glmer, uncorrected wts                7.92 0.831  6.35 10.56
glmer, corrected weights              7.68 0.538  6.38  9.75
glmer, uncorr wts, random intercept   7.99 0.304  6.86  9.22
glmer, corr wts, random intercept     7.62 0.373  6.48  9.21
GLM: Corr weights                     7.54 0.890  6.24 12.61

MedFlyL3: 
                                    estval   var lower upper
lmm (rlmer), random intercept         6.14 0.254  5.14  7.35
glmer, uncorrected wts                7.09 0.555  5.78  9.20
glmer, corrected weights              7.01 0.520  5.77  9.15
glmer, uncorr wts, random intercept   6.84 0.212  5.91  7.89
glmer, corr wts, random intercept     6.97 0.395  5.84  8.70
GLM: Corr weights                     6.67 0.757  5.49 11.68

MelonFlyEgg: 
                                    estval    var lower upper
lmm (rlmer), random intercept         3.58 0.3327  2.25  5.05
glmer, uncorrected wts                3.33 0.2173  2.54  5.07
glmer, corrected weights              2.74 0.0546  2.32  3.46
glmer, uncorr wts, random intercept   3.32 0.1910  2.60  4.91
glmer, corr wts, random intercept     2.74 0.0522  2.33  3.43
GLM: Corr weights                     2.74 0.1274  2.29 34.57

MelonFlyL1: 
                                    estval  var lower upper
lmm (rlmer), random intercept         7.89 2.78  3.76  13.5
glmer, uncorrected wts                7.99 6.00  4.55  41.2
glmer, corrected weights              7.84 3.91  4.76  24.8
glmer, uncorr wts, random intercept   7.99 2.32  5.07  12.3
glmer, corr wts, random intercept     7.84 2.76  4.94  15.9
GLM: Corr weights                     7.84 6.18    NA    NA

MelonFlyL2: 
                                    estval   var lower upper
lmm (rlmer), random intercept         11.1  1.88  8.41  14.7
glmer, uncorrected wts                12.3 11.58  7.84  37.1
glmer, corrected weights              12.2  5.84  8.52  22.1
glmer, uncorr wts, random intercept   12.5  1.78  9.74  15.5
glmer, corr wts, random intercept     12.5  2.37  9.62  16.5
GLM: Corr weights                     14.0  7.12 10.58  34.2

MelonFlyL3: 
                                    estval   var lower upper
lmm (rlmer), random intercept         8.55 0.619  7.00  10.5
glmer, uncorrected wts                9.65 2.285  7.25  14.9
glmer, corrected weights              9.73 1.505  7.64  13.4
glmer, uncorr wts, random intercept   9.68 0.591  8.10  11.4
glmer, corr wts, random intercept     9.62 0.798  7.93  11.9
GLM: Corr weights                     9.71 1.629  7.93  15.9

Comparison of results

The GLM model (with corrected weights) is clearly unsatisfactory. The fitting of a random slope does in some cases, for the model fitted using glmer, increase the width of the confidence interval. This happens both with and without the weighting. The same would almost certainly be the case for the rlmer model.

Use of glmer with “corrected” weights does lead to more consistent confidence intervals than for glmer with uncorrected weights.

Models 1 (rlmer with random intercepts only) and 5 (glmer with random intercepts only, corrected weights) give very similar confidence intervals. It is reasonable to expect that rlmer with random intercepts and slopes would give similar results to the weighted glmer model with random intercepts and slopes.

Model 6 is clearly unsatisfactory.

Slope Versus Intercept Plot

Attention is limited to the linear mixed model. 90% confidence ellipses are added.

Slope versus intercept plot.  The intercept is a difference
from the mean. Contour lines have been added for constant LT99=11 days
(labeled with Es), LT99=9 days (labeled with 9s), and LT99=7 days
(labeled with 7s).

Slope versus intercept plot. The intercept is a difference from the mean. Contour lines have been added for constant LT99=11 days (labeled with Es), LT99=9 days (labeled with 9s), and LT99=7 days (labeled with 7s).

Hotelling T2

The following uses a Hotelling T2 test to compare the means in the cases where the two ellipses (medFly L1 and MedFly L3) are closest to touching. The following requires careful checking. The \(p\)-value should be adjusted upwards for the number of comparisons made.

[1] 5e-04

On Hotelling T2, see, e.g., https://en.wikipedia.org/wiki/Hotelling%27s_T-squared_distribution#Statistic