Serial cognitive assessment is used in clinical practice, clinical trials, and longitudinal studies of aging and dementia to track cognitive fluctuations over time and to identify clinically significant declines in performance suggestive of mild cognitive impairment (MCI) or dementia. Screening measures with specific cut-points reflecting probable cognitive impairment are also frequently used as brief, first-line measures of gross cognitive functioning in both clinical and research settings. For example, patients performing below the cut-point on a screening measure may be referred for more extensive diagnostic evaluation. Research participants may be screened into or out of studies based upon whether their performance lies above or below the cut-point of the measure. When cognitive instruments are used repeatedly, it is imperative to know not only the sensitivity, specificity, positive predictive value, and negative predictive value of the instruments, but also their behavior over time.
Practice effects (PE) represent one aspect of that behavior. PE are distinct from random fluctuations in performance and refer to bias due to familiarity with test items and procedures when a test is retaken . Longitudinal studies of cognitive aging are highly dependent on repeated testing with neuropsychological measures. For example, dementia prevention trials such as the Alzheimer’s Disease Anti-inflammatory Prevention Trial (ADAPT) , Gingko Evaluation of Memory Study (GEMS) , and the Prevention of Alzheimer’s Disease by Vitamin E and Selenium (PREADViSE) trial  rely heavily on repeated cognitive screening measures and standardized cognitive batteries for case ascertainment and tracking response to treatment.
Most studies demonstrating practice effects have involved test-retest paradigms over short time intervals [5–9] or have been conducted primarily with impaired populations [10–12]. Nonetheless, repeated testing effects have been well documented [13–17], and performance variability has been demonstrated to be influenced by age [18–21], fluid intelligence , clinical population [10, 22], retest interval [9, 12, 23, 24], and the test or neurocognitive domain assessed [25, 26]. Knowledge of the effects of repeated presentation is essential for interpretation of results. For example, PE can potentially alter the measure’s sensitivity to cognitive change and have been found to account for between 31 and 83% of the variance in follow-up test scores . Further, PE could influence dementia detection in prevention trials when screening measures are used, especially given known PE, even for participants with Alzheimer’s disease (AD), on measures such as the Mini-Mental State Exam .
Furthermore, PE may persist over long periods of time. In the UK, Rabbit and colleagues  examined PE over a 17-year period in 5,899 participants, ages 49 to 92. Similar to other studies, they found the greatest gain in performance between the first and second presentation but observed gains due to practice on intelligence tests over intervals of several years. In a separate sample studied over a 20-year period, the same authors again observed significant PE, even with time intervals of up to four years . Given this finding, it is also likely that PE may affect whether one performs above or below a single cut-point and thus influence case ascertainment in longitudinal clinical trials.
In the present study, we sought to examine PE on the Memory Impairment Screen (MIS) over four annual administrations. Brief memory screening instruments are often used in clinical practice and research to identify those patients who might benefit from a more extensive clinical assessment, and whether specific individuals should be included in a research study. Some studies, such as the PREADViSE trial, rely on dementia screening measures to determine whether a participant should be evaluated with more in-depth cognitive assessment. More specifically, if performance on screening measures is influenced by PE, participants who may be cognitively impaired or demented will be adjudicated as cognitively normal and thus misclassified or potentially lost to follow-up. Given previous data on short-term and long-term PE, we hypothesized that despite efforts to mitigate PE through alternate test versions, MIS scores would improve over time.