Skip to main content
Log in

Modelling grouped survival times in toxicological studies using Generalized Additive Models

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

A method for combining a proportional-hazards survival time model with a bioassay model where the log-hazard function is modelled as a linear or smoothing spline function of log-concentration combined with a smoothing spline function of time is described. The combined model is fitted to mortality numbers, resulting from survival times that are grouped due to a common set of observation times, using Generalized Additive Models (GAMs). The GAM fits mortalities as conditional binomials using an approximation to the log of the integral of the hazard function and is implemented using freely-available, general software for fitting GAMs. Extensions of the GAM are described to allow random effects to be fitted and to allow for time-varying concentrations by replacing time with a calibrated cumulative exposure variable with calibration parameter estimated using profile likelihood. The models are demonstrated using data from a studies of a marine and a, previously published, freshwater taxa. The marine study involved two replicate bioassays of the effect of zinc exposure on survival of an Antarctic amphipod, Orchomenella pinguides. The other example modelled survival of the daphnid, Daphnia magna, exposed to potassium dichromate and was fitted by both the GAM and the process-based DEBtox model. The GAM fitted with a cubic regression spline in time gave a 61 % improvement in fit to the daphnid data compared to DEBtox due to a non-monotonic hazard function. A simulation study using each of these hazard functions as operating models demonstrated that the GAM is overall more accurate in recovering lethal concentration values across the range of forms of the underlying hazard function compared to DEBtox and standard multiple endpoint probit analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Aitken M, Clayton DG (1980) The fitting of exponential, Weibull and extreme value distributions to complex censored survival data using GLIM. J R Stat Soc C-App 29:56–63

    Google Scholar 

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Proceedings of the second international symposium on information theory. Akademinai Kiado, Budapest, pp 267–281

    Google Scholar 

  • Ashauer R, Boxall ABA, Brown CD (2007) New ecotoxicological model to simulate survival of aquatic invertebrates after exposure to fluctuating and sequential pulses of pesticides. Environ Sci Technol 41:1480–1486

    Article  CAS  PubMed  Google Scholar 

  • Bedaux JJM, Koojiman SALM (1994) Statistical analysis of bioassays, based on hazard modelling. Environ Ecol Stat 1:303–314

    Article  Google Scholar 

  • Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25

    Google Scholar 

  • Candy SG (1986) Fitting a parametric log-linear hazard function to grouped survival data. GLIM Newsl 13:28–31

    Google Scholar 

  • Candy SG (1989) Growth and yield models for Pinus radiata in Tasmania. N Z J For Sci 19:12–133

    Google Scholar 

  • Candy SG (2002) Empirical binomial sampling plans: model calibration and testing using William’s method III for generalised linear models with overdispersion. JABES 7:373–388

    Article  Google Scholar 

  • Cox DR, Oakes D (1984) Analysis of survival data. Chapman & Hall, London

    Google Scholar 

  • Cox C (1988) Multinomial regression models based on continuation ratios. Stat Med 7:435–441

    Article  CAS  PubMed  Google Scholar 

  • Davidian M, Giltinan DM (2003) Nonlinear models for repeated measurement data: an overview and update. JABES 8:387–419

    Article  Google Scholar 

  • Draper NR, Smith H (1998) Applied regression analysis, 3rd edn. Wiley, New York

    Google Scholar 

  • Egli P, Schmid B (2001) The analysis of complex leaf survival data. Basic Appl Ecol 2:223–231

    Article  Google Scholar 

  • Fienberg SE (1980) The analysis of cross-classified categorical data, 2nd edn. M.I.T University Press, Cambridge

    Google Scholar 

  • Finney DJ (1971) Statistical method in biological assay, 3rd edn. Griffith, London

    Google Scholar 

  • Gay DM (1990) Usage Summary for Selected Optimization Routines. Comput Sci Tech Rep No. 153. AT&T Bell Laboratories Murray Hill, NJ 07974. (http://netlib.bell-labs.com/cm/cs/cstr/153.pdf)

  • Gilks WR, Richardson S, Spiegelhalter DJ (1996) Markov Chain Monte Carlo in practice. CRC, Boca Raton

    Google Scholar 

  • Green PJ (1987) Penalized likelihood for general semi-parametric regression models. Int Stat Rev 55:245–259

    Article  Google Scholar 

  • Gurland J, Sethuraman J (1995) How pooling failure data may reverse increasing failure rates. J Am Stat Assoc 432:1416–1423

    Article  Google Scholar 

  • Heckmann LH, Baas J, Jager T (2010) Time is of the essence. Environ Toxicol Chem 29:1396–1398

    CAS  PubMed  Google Scholar 

  • Jager T, Heugens EHW, Kooijman SALM (2006) Making sense of ecotoxicological test results: towards application of process-based models. Ecotoxicology 15:305–314

    Article  CAS  PubMed  Google Scholar 

  • Kauermann G (2005) Penalized spline smoothing in multivariable survival models with varying coefficients. Comput Stat Data Anal 49:169–186

    Article  Google Scholar 

  • Kooijman SALM, Bedaux JJM (1996) The analysis of aquatic toxicology data. VU University Press, Amsterdam

    Google Scholar 

  • Lee Y, Nelder JA (1996) Hierarchical generalized linear models (with discussion). J R Stat Soc B 58:619–678

    Google Scholar 

  • Luger M, Brandt B, Bedaux J, Kooijman B (2004) DEBtox/DEBdeg v 2.0.1. Vrije Universiteit, Amsterdam, Netherlands (http://www.bio.vu.nl/thb/)

  • McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, London

    Book  Google Scholar 

  • McCulloch CE (1997) Maximum likelihood algorithms for generalized linear mixed models. J Am Stat Assoc 92:162–170

    Article  Google Scholar 

  • Pan J, Thompson R (2007) Quasi-Monte Carlo estimation in generalized linear mixed models. Comput Stat Data Anal 51:5765–5775

    Article  Google Scholar 

  • Péry ARR, Bedaux JJM, Zonneveld C, Kooijman SALM (2001) Analysis of bioassays with time-varying concentrations. Water Res 35:3825–3832

    Article  PubMed  Google Scholar 

  • Pinheiro J, Bates D, DebRoy S, Sarkar D, R Development Core Team (2013) nlme: linear and nonlinear mixed effects models. R package version 3.1-111

  • R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (http://www.R-project.org)

  • Schall R (1991) Estimation in generalized linear models with random effects. Biometrika 78:719–727

    Article  Google Scholar 

  • Sfiligoj BJ (2013) Improved methods for determining the toxicity of metals to the Antarctic amphipod, Orchomenella pinguides by joint modelling of survival response to concentration level and exposure duration. PhD thesis, Deakin University, Australia

  • Skrondal A, Rabe-Hesketh S (2009) Prediction in multilevel generalized linear models. J R Stat Soc A Stat 172:659–687

    Article  Google Scholar 

  • Vaida F, Xu R (2000) Proportional hazards model with random effects. Stat Med 19:3309–3324

    Article  CAS  PubMed  Google Scholar 

  • Verbyla AP, Cullis BR, Kenward MG, Welham SJ (1999) The analysis of designed experiments and longitudinal data using smoothing splines (with discussion). J R Stat Soc C-App 48:269–311

    Article  Google Scholar 

  • Williams DA (1986) Interval estimation of the median lethal dose. Biometrics 42:641–645

    Article  CAS  PubMed  Google Scholar 

  • Wood SN (2003) Thin plate regression splines. J R Stat Soc B 65:95–114

    Article  Google Scholar 

  • Wood SN (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J Am Stat Assoc 99:673–686

    Article  Google Scholar 

  • Wood SN (2006) Generalized additive models. CRC, Boca Raton

    Google Scholar 

  • Wood SN (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc B 73:3–36

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful to Professor Simon Wood for assistance in understanding the workings of the gam and gamm functions in his mgcv package. The constructive reviews by the Associate Editor and the referees contributed substantially to improvements in this work and its presentation. This research was partly funded by the Australian Antarctic Division through AAS Project 2933 (CI King).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. G. Candy.

Additional information

Handling Editor: Pierre Dutilleul.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 59 KB)

Appendices

Appendix 1: Product multinomial and conditional binomial log-likelihoods

Bedaux and Koojiman (1994) fit the DEBtox model and its variants using maximum likelihood with a product multinomial likelihood specification. Maximum likelihood estimation in DEBtox is obtained by minimisation of the \(-\)2 log-likelihood function, \(L\left( {n_{ij} \left| {N_0 ,{\varvec{\upbeta }}} \right. } \right) \) where

$$\begin{aligned} L\left( {n_{ij} \left| {N_0 ,{\varvec{\upbeta }}} \right. } \right) =-2\sum \nolimits _{j=1}^J {\sum \nolimits _{i=1}^{I+1} {n_{ij} \log \left( {P_{ij} } \right) } } +C_{PM} \end{aligned}$$
(8)

and

$$\begin{aligned} P_{ij}&= E\left( {n_{ij} \left| {N_0 ,{\varvec{\upbeta }}} \right. } \right) /N_0 \qquad ; i=1,\ldots ,I \\&= 1-\sum \nolimits _{{i}'=1}^I {P_{{i}'j} } \qquad \quad ; i=I+1,\\ \end{aligned}$$

\(n_{I+1,j} =N_0 -\sum \nolimits _{i=1}^I {n_{ij} } \) and \(C_{PM} =-2\sum _{j=1}^J {\log \left( {N_0 !/\left\{ {\Pi _{i=1}^{I+1} n_{ij} !} \right\} } \right) } \). The deviance is defined as \(D\left( {\varvec{\upbeta }} \right) =L\left( {n_{ij} \left| {N_0 ,{\varvec{\upbeta }}} \right. } \right) -L\left( {n_{ij} \left| {N_0 ,{\varvec{\upbeta }}_{sat} } \right. } \right) \)where \({\varvec{\upbeta }}_{sat}\) is the “saturated” parameter set that gives exact predictions of observed mortalities, so that \(\hat{{P}}_{ij} =n_{ij} /N_0\). Note that in this case for values of \(n_{ij} =0\) the term \(\log \left( {P_{ij} } \right) \) in the above formula is not calculated but simply set to zero or an arbitrary constant and zero values of \(n_{ij}\) contribute zero to the value of the deviance. The residual deviance is defined as \(D\left( {\varvec{\upbeta }} \right) \) when \({\varvec{\upbeta }}\) parameterizes the most complex model.

For the GAM fit, the conditional -2 log-likelihood, \(L\left( {n_{ij} \left| {N_{i-1,j} ,{\varvec{\upbeta }}} \right. } \right) \), is minimised to give conditional maximum likelihood estimation where for values of \(n_{ij}\) for which \(N_{ij} >0\)

$$\begin{aligned} L\left( {n_{ij} \left| {N_{i-1,j} ,{\varvec{\upbeta }}} \right. } \right) =-2\left[ {\sum \nolimits _{j=1}^J {\sum \nolimits _{i=1}^I {\left[ {n_{ij} \log \left( {Q_{ij} } \right) +\left( {N_{i-1,j} -n_{ij} } \right) \log \left( {1-Q_{ij} } \right) } \right] } } } \right] +C_{CB}\nonumber \\ \end{aligned}$$
(9)

where

$$\begin{aligned} \begin{array}{ll@{\quad }l} &{}N_{ij}=N_0 -\sum \nolimits _{{i}'=1}^{i-1} {n_{{i}'j} } &{}\qquad ; i=2,\ldots ,I\\ &{} \quad = N_0 &{}\qquad ; i=1,\\ \end{array} \end{aligned}$$

\(Q_{ij} =1-\exp \left( {-\exp \left\{ {\log \left[ {h_j \left( {{t}'_i ,{\varvec{\upbeta }}} \right) } \right] +\log \left( {t_i -t_{i-1} } \right) } \right\} } \right) ,\) where \(h_j \left( {{t}'_i ,{\varvec{\upbeta }}} \right) \) is given by model (6), and \(C_{CB} =-2\sum \nolimits _{j=1}^J {\log \left( {\Pi _{i=1}^I \left[ {N_{i-1,j} !/\left\{ {n_{ij} !\left( {N_{i-1,j} -n_{ij} } \right) !} \right\} } \right] } \right) }\).

Values of \(N_{i-1,j}\) of zero and corresponding zero values of \(n_{ij}\) are excluded from the calculation of the deviance (i.e. \(I_j \le I)\) since they do not contribute to the deviance irrespective of the value of \(\hat{{n}}_{ij}\) whereas zero values of \(n_{ij}\) with corresponding non-zero values of \(N_{i-1,j}\) do contribute and so are retained. The “null” deviance can be calculated using (9) for the fit of the GLM defined with simply the intercept as the only estimated parameter. The deviance for the GAM model can be expressed as the proportional reduction from this “null” deviance and denoted “proportion deviance explained” (PDE) analogously to the coefficient of determination, or \(R^{2}\), in ordinary least squares regression.

Note that log-likelihoods (8) and (9) are equivalent (Fienberg 1980; Cox 1988). Therefore it follows that for the GAM fit \(L\left( {n_{ij} \left| {N_{i-1,j} ,{\varvec{\upbeta }}} \right. } \right) \) is identical to \(L\left( {n_{ij} \left| {N_0 ,{\varvec{\upbeta }}} \right. } \right) \) if the \(P_{ij}\) used in this last quantity are calculated using GAM fitted values of \(\hat{{n}}_{ij} =N_{i-1,j} \hat{{Q}}_{ij}\) as described next. These values of \(P_{ij}\) are obtained by the following recursive formula

$$\begin{aligned} \begin{array}{lll} &{}P_{ij}= \hat{{n}}_{ij} /N_{0 } &{}\quad ; i=1\\ &{}\quad =\left\{ {1-\sum \limits _{{i}'=1}^{i-1} P_{{i}'j} } \right\} \hat{{n}}_{ij} /N_{i-1,j} &{}\quad ; i=2,\ldots ,I_j -1\\ &{}\quad =\left\{ {1-\sum \limits _{{i}'=1}^{I_j -1} P_{{i}'j} } \right\} &{}\quad ; i=I_j . \end{array} \end{aligned}$$
(10)

The deviances are also therefore identical where the conditional binomial deviance is defined as \(D\left( {\varvec{\upbeta }} \right) =L\left( {n_{ij} \left| {N_{i-1,j} ,{\varvec{\upbeta }}} \right. } \right) -L\left( {n_{ij} \left| {N_{i-1,j} ,{\varvec{\upbeta }}_{sat} } \right. } \right) \). The \(P_{ij}\) are obtained by Eq. (10) using the function GAM_UCpredict in Supplementary Material. A more accurate estimate can be obtained by integrating the hazard function using daily time steps within each period, given that \(P_{ij} =S_j \left( {t_{i-1} } \right) -S_j \left( {t_i } \right) \) (using the function GAM_SurvPCLs), however, \(P_{ij}\) calculated using Eq. (10) was used to calculated deviances because it incorporates any inaccuracy in the midpoint approximation in the measure of goodness-of-fit.

Appendix 2: Hazard function parameter estimation

Estimation of \({\varvec{\upbeta }}\) in the DEBtox model (4) involves minimising the \(L\left( {n_{ij} \left| {N_0 ,{\varvec{\upbeta }}} \right. } \right) \) using general optimisation software. Here we carry out the optimisation using both the DEBtox software (Luger et al. 2004) and using the nlminb function in the R software (R Core Team 2013) which uses the PORT routines (Gay 1990) and our implementation uses numerical approximation to gradients. Estimation of \(\varvec{\upbeta }\) for the GAM implemented in the gam function of the R package mgcv for a fixed value of the smoothing parameter minimises the deviance,\(D\left( {\varvec{\upbeta }} \right) \), as described above, when additionally a quadratic penalty term in the \(\beta \)’s associated with the smooth term is scaled by the smoothing parameter, [see Equation 3 of Wood (2011)]. The smoothing parameter estimation algorithm applied by gam for a binomial is Un-Biased Risk Estimator (UBRE) but other algorithms are available. This default option was used in the application of gam in this study. The extension of the GAM to incorporate random effects was given by model (7) which is a member of the class of Generalized Additive Mixed Models (GAMMs) and is fitted using the gamm function of the R package mgcv. The gamm function iterates between calls to the gam and lme functions where this last function is part of the nlme R package. The lme function fits a linear mixed model to working response and working predictor variables supplied by gam in order to estimate variance components using Restricted Maximum Likelihood (REML). This involves minimising the sum of the deviance \(D\left( {\varvec{\upbeta }} \right) \), after scaling by dividing by the dispersion parameter \(\varphi \), and the term \({\varvec{\nu }}^{T}\mathbf{D}^{-1} {\varvec{\nu }}\) where the linear predictor of the GAMM includes the vector of random effects, \({\varvec{\nu }}\). These are assumed independently and normally distributed so that \(\mathbf{D}\) is the diagonal matrix with variances of the random effects on the diagonal with these variances structured to represent variance due to random main effects (e.g. replicate bioassays) or random interactions (e.g. concentration by day factor random effects). This algorithm is described as maximum penalised quasi-likelihood (PQL) (Green 1987; Schall 1991; Breslow and Clayton 1993) for generalized linear mixed models (GLMMs).

Appendix 3: Approximate Fieller intervals for lethal concentrations

For the GAM let \(\mathbf{X}\) be the ‘working’ design matrix with row dimension \({I}'\) corresponding to the sequence (using daily time steps) of 0.5, 1.5, up to reference day \({I}'-0.5\) and column dimension is \(p\) the length of the vector \({\varvec{\upbeta }}\) including the parameters associated with the smooth term. The iterative search to estimate the LC, \({C}'\), corresponding to cumulative probability of mortality to day \({I}'\) of \({P}'=1-S_{{I}'}\), involves first estimating \({P}'\). If \(\mathbf{L}=\hbox {diag}\left\{ {\exp \left( {\mathbf{X}_{{C}'} \hat{\varvec{\upbeta }}} \right) } \right\} \) then \(\hat{{{P}'}}=1-\exp \left( {-\mathbf{a}^{T}\mathbf{La}} \right) \) where \(\mathbf{X}_{{C}'} \) has the column corresponding to \(\beta _2\) in Eq. (6) set to \(\log \left( {{C}'} \right) \) where \({C}'\) is the estimate from the previous iteration and \(\mathbf{a}\) is the \({I}'\)-length column vector of 1’s. The final estimate of \({C}'\), \(\hat{{{C}'}}\), is obtained when \(\delta =\left| {\hat{{{P}'}}-{P}'} \right| \) in the iterative search converges to a value sufficiently close to zero (see function GAMLCs in Supplementary Material which uses the nlminb optimisation function).

To calculate approximate \(100\left( {1-\alpha } \right) \)% Fieller intervals (FI) for \(\hat{{{C}'}}\), the lower and upper Fieller limits (FLs) were determined using the above procedure with \(\hat{{{P}'}}\) replaced by \(\hat{{{P}'}}_{\alpha /2}\) and \(\hat{{{P}'}}_{1-\alpha /2}\) confidence limits, respectively, where

$$\begin{aligned}&\hat{{{P}'}}_{\alpha /2} =1-\exp \left\{ {-\left[ {\mathbf{a}^{T}\mathbf{La} + z_{\alpha /2} \left( {\mathbf{a}^{T}\mathbf{Va}} \right) ^{\frac{1}{2}}} \right] } \right\} ,\\&\qquad \hat{{{P}'}}_{1-\alpha /2} =1-\exp \left\{ {-\left[ {\mathbf{a}^{T}\mathbf{La} + z_{1-\alpha /2} \left( {\mathbf{a}^{T}\mathbf{Va}} \right) ^{\frac{1}{2}}} \right] } \right\} , \end{aligned}$$

\(\mathbf{V}=\mathbf{L}\hat{{\Sigma }}_{\hat{\varvec{\upbeta }}} \mathbf{L}, \quad Var\left( {\hat{\varvec{\upbeta }}} \right) =\Sigma _{\hat{\varvec{\upbeta }}},\) and \(z_{\alpha /2}\) is the \(100\alpha /2\,\%\) critical value for a standard normal distribution (see function GAMLCs in Supplementary Material). The variance–covariance matrix V is an approximation corresponding to a first-order Taylor series approximation. The \(z_{\alpha /2}\) can be replaced by a t-distribution critical value, \(t_{f,\alpha /2}\), where \(f\) is the residual degrees of freedom, however, the results from the simulation study (Table 2) showed that both the nominal value of \({P}'\) and the value of \(N_0\) are important in determining the adequacy of the coverage of the above Fieller interval.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Candy, S.G., Sfiligoj, B.J., King, C.K. et al. Modelling grouped survival times in toxicological studies using Generalized Additive Models. Environ Ecol Stat 22, 465–491 (2015). https://doi.org/10.1007/s10651-014-0306-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-014-0306-3

Keywords

Navigation