A hidden Markov model for informative dropout in longitudinal response data with crisis states

https://doi.org/10.1016/j.spl.2011.02.005Get rights and content

Abstract

We adopt a hidden state approach for the analysis of longitudinal data subject to dropout. Motivated by two applied studies, we assume that subjects can move between three states: stable, crisis, dropout. Dropout is observed but the other two states are not. During a possibly transient crisis state both the longitudinal response distribution and the probability of dropout can differ from those for the stable state. We adopt a linear mixed effects model with subject-specific trajectories during stable periods and additional random jumps during crises. We place the model in the context of Rubin’s taxonomy and develop the associated likelihood. The methods are illustrated using the two motivating examples.

Introduction

Longitudinal studies often suffer attrition, in that individuals drop out of the study before the scheduled completion time, and thus present incomplete data. A variety of methods have by now been developed for dealing with the possibility that dropout is related to responses (Hogan et al., 2004, Molenberghs et al., 2004, Philipson et al., 2008, Tsiatis and Davidian, 2004), though caution in using such methods is always needed (Molenberghs et al., 2004, Molenberghs et al., 2008).

Recently le Cessie et al. (2009) recognised that longitudinal data analysis can be complicated by the fact that during follow-up, subjects can change condition or state, examples being remission, relapse and death for cancer patients. Both longitudinal responses and the dropout probability can depend on the current state and this needs to be accounted for in analysis. The methods developed by le Cessie et al. (2009) are appropriate when the underlying state is observed. If a state is defined by the level of a response variable but obscured by measurement error, then the hidden Markov methods of Satten and Longini (1996) or Guihenneuc-Jouyaux et al. (2000) can form a basis for analysis. But, as argued by Liestøl and Andersen (2002), there are circumstances where a subject’s state is either hidden or vaguely defined. For the liver cirrhosis application considered by Liestøl and Andersen (2002) for example, some subjects experienced apparent “crises”, marked by a sudden change in response values. These crises could be transient or could indicate a terminal disease stage.

In this work we build on the ideas of le Cessie et al. (2009) and Liestøl and Andersen (2002) and develop a hidden state modelling approach for longitudinal data subject to dropout. We assume that during follow-up subjects can experience different states, which we will think of as a stable state, a crisis state and dropout. The first two states are transient and reversible, while the third, dropout, is an absorbing state. The crisis state can be defined as an intermediate phase where significant changes of the response values can be observed and where the probability of dropout is increased. We assume that the longitudinal response is associated with the underlying state but we assume that states other than dropout are not directly observed, and perhaps not precisely defined. We exclude situations where the state is defined by the response, such as for AIDS when the CD4 T-cell first reaches a given level, and leukaemia relapse when an residual leukaemic cell count is over a defined threshold (De Lorenzo et al., 2005).

In Section 2 we provide brief details of two applications which motivated our work. Our model is introduced and the estimation outlined in Section 3, where we also argue for the merits of proper treatment of time ordering when considering missingness mechanisms for longitudinal data. Section 4 includes summaries of our analyses of the motivating data sets, and some brief comments in Section 5 conclude the paper.

Section snippets

Schizophrenia data

We consider data from a trial into treatment of schizophrenia, previously described by Henderson et al. (2000) and Diggle et al. (2007). There are three treatment groups (standard, placebo and experimental) and the response of interest is the Positive and Negative Symptom Scale (PANSS), which is high for subjects with poor condition. Six assessments were scheduled, at weeks 0, 1, 2, 4, 6 and 8, but of the 518 subjects under consideration only 269 completed the trial. Of the remainder, 183

The model and assumptions

We will begin with the general situation. We assume a balanced design with common scheduled assessment times for all n subjects recruited into the study. We label the assessment times t=1,2,,T but do not require equal spacing in calendar time. For the moment we will consider a single generic subject and do not use a subscript to differentiate between individuals. We identify two stochastic processes in t: a partly unobservable finite state first-order Markov chain, St, and an observable

Schizophrenia data

For the fixed effects component we assume separate quadratic time trends within each of the three treatment groups. We take the logistic model (1) for the probability ϕ13(t) of a direct transition from a stable state to dropout, but we assume that the coefficient of the previous response is time-constant, i.e. α1t=α1. We assume that transitions out of a crisis state are time-homogeneous: ϕ21(t)=ϕ21 and ϕ23(t)=ϕ23.

We performed three analyses of these data: standard maximum likelihood, Bayes

Discussion

We have proposed an approach to modelling longitudinal data subject to dropout which might be useful when there are indications that subjects can have high risk or crisis periods during which the response variable can change dramatically and the probability of dropout be affected. Our model is MAR given complete data filtrations but MNAR given only observed data filtrations. We do not claim that our approach will always be appropriate, but we do consider it potentially useful. In both the

Acknowledgements

We gratefully acknowledge Dr. Ton de Craen and Dr. Rudi Westendorp of the Leiden University Medical Centre, for kindly providing the analysed data. We thank the guest editor and an anonymous reviewer for helpful comments on an earlier version of the manuscript.

References (20)

  • A.B. der Wiel et al.

    A high response is not essential to prevent selection bias: results from the Leiden 85-plus study

    Journal of Clinical Epidemiology

    (2002)
  • K. Christensen et al.

    The quest for genetic determinants of human longevity: challenges and insights

    Nature Reviews Genetics

    (2006)
  • R.B. Davies

    Hypothesis testing when a nuisance parameter is present only under the alternative

    Biometrika

    (1977)
  • R.B. Davies

    Hypothesis testing when a nuisance parameter is present only under the alternative

    Biometrika

    (1987)
  • P. De Lorenzo et al.

    Analysis of interval-censored longitudinal data with application to onco-haematology

    Statistics in Medicine

    (2005)
  • P.J. Diggle
  • P.J. Diggle et al.

    Analysis of longitudinal data with drop-out: objectives, assumptions and a proposal (with discussion)

    Applied Statistics

    (2007)
  • C. Dufoil et al.

    Analysis of longitudinal studies with death and drop-out: a case study

    Statistics in Medicine

    (2004)
  • C. Guihenneuc-Jouyaux et al.

    Modeling markers of disease progression by a hidden Markov process: application to characterizing CD4 cell decline

    Biometrics

    (2000)
  • R. Henderson et al.

    Joint modelling of longitudinal measurements and event time data

    Biostatistics

    (2000)
There are more references available in the full text version of this article.
View full text