Skip to main content

Statistical analysis plan for a cluster randomised trial in Madhya Pradesh, India: support to rural India’s public education system and impact on numeracy and literacy scores (STRIPES2)

A Study protocol to this article was published on 25 June 2020

Abstract

Background

India has made steady progress in improving rates of primary school enrolment but levels of learning achievement remain low. The Support To Rural India’s Public Education System (STRIPES) trial provided evidence that an after-school para-teacher intervention improved numeracy and literacy levels in Telangana, India. The STRIPES2 trial investigates whether such an intervention will have a similar effect on the literacy and numeracy of primary school age children in the Satna District of Madhya Pradesh, India.

Methods/design

The STRIPES2 trial forms one part of a cluster-randomised controlled trial with villages (clusters) randomised to receive either a health (CHAMPION2) or education (STRIPES2) intervention. Building on the design of the earlier CHAMPION/STRIPES trial, villages receiving the health intervention are controls for the education intervention and vice versa. The primary outcome is a combined literacy and numeracy score. Secondary outcomes include separate scores for literacy and numeracy; caregivers’ engagement with child’s learning; expenditure on education; enrolment in school; caregiver’s report of school attendance and the cost effectiveness of the intervention. Over 7000 primary school age children have been recruited and randomised in STRIPES2.

Discussion

This update to the published trial protocol gives a detailed plan for the statistical analysis of the STRIPES 2 trial.

Trial registration

Registry of India: CTRI/2019/05/019296. Registered on 23 May 2019. http://www.ctri.nic.in/Clinicaltrials/pdf_generate.php?trialid=31198&EncHid=&modid=&compid=%27,%2731198det%27

Peer Review reports

Introduction

Background and rationale

India has made steady progress in improving rates of primary school enrolment. In rural areas, about 97% of children between 6 and 14 years of age are now in school [1]. The levels of learning achievement, however, remain low. The 2018 Annual Status of Education Report (ASER) survey showed that proficiency in reading and numeracy is worryingly low and Indian children may spend several years in school without learning even the basic skills in literacy and numeracy [1]. The STRIPES trial and subsequent SCORE trial intervention demonstrated important results in improving numeracy and language scores in Telangana, India [2] and rural Gambia [3]. The STRIPES 2 trial [4] investigates whether such an intervention will have a similar effect on the literacy and numeracy of primary school age children in Satna District of Madhya Pradesh, India.

Objectives

The primary objective is to assess whether the success of the STRIPES and SCORE trials in providing an after-school para-teacher intervention to raise learning levels among primary school students in rural India and rural Gambia can be replicated in Satna district of Madhya Pradesh, India.

The primary outcome is a combined literacy and numeracy score. Secondary outcomes include separate scores for literacy and numeracy; caregivers’ engagement with child’s learning; expenditure on education; enrolment in school; caregiver’s report of school attendance and the cost effectiveness of the intervention.

Study methods

Trial design

This is a cluster-randomised controlled trial where the recruited clusters are villages in the Satna district of Madhya Pradesh, India. The villages included satisfied the following criteria:

  1. 1.

    Were considered rural, with fewer than 2500 population and with more than 120 children under the age of 6 years;

  2. 2.

    Were accessible by road;

  3. 3.

    Weren’t within a 5 km radius of the Community Health Centres (as such villages are already well-served by the local health services);

  4. 4.

    Had a minimum of 3 km between village centres, such buffer zones being included to minimize contamination.

From a baseline survey conducted between July 2017 and January 2018 we enrolled children born between 16 June 2010 and 15 June 2013 whose caregivers were planning to enrol them in the first grade, for the first time, in the 2018–2019 school year in eligible villages. Before randomization of villages, from April-June 2019, we conducted a catch-up enumeration in all the selected villages to enrol eligible children who were missed during the baseline enumeration (this included some children who were by this time attending school). Villages were allocated in a 1:1 ratio to either the intervention (a programme provided by Pratham intending to provide remedial out-of-school lessons, focusing on literacy and numeracy, 6 days a week, 2 h a day for 17 months), or to control.

Planned daily classes were temporarily stopped in compliance with government measures to reduce COVID-19 transmission from April-Dec 2020 and May–June 2021. The intervention was restarted with modifications according to the local COVID-19 guidelines such as daily small group and weekly (for children who couldn’t attend daily classes) classes. The intervention period was also extended by 12 months, ending in June 2022.

Between 24th July and 19th September 2022 participant children in both trial arms were tested with Early Grade Reading Assessment (EGRA) [5] and the Early Grade Mathematics Assessment (EGMA) [6] tests adapted to the local language and context. After the testing all the children were given a small set of school material as recompense for their time.

Randomisation

Randomisation of clusters was performed by the trial statistician based in London in June 2019 using a random number generator, with stratification by village size and distance to the nearest Community Health Centre or Civil Hospital.

Sample size

The relevant parts of the original sample size calculation as published in the protocol were as follows.

Originally it had been the intention to randomise 300 villages, because this gave over 90% statistical power to detect a difference of 0.25 standard deviations in mean standardised test scores in STRIPES 2. However, incorporating the buffer zones described in the village selection procedure above meant that only 204 villages could be selected. These 204 villages have a mean population of 1487 (minimum 558, maximum 2490) and a standard deviation of 505 (equating to a coefficient of variation of 0.34). Estimating the number of children in each school year from the number under the age of six years old (divided by 6), the mean number of children in each school year is 38.3 (minimum 20, maximum 71) with a standard deviation of 13.3 (a coefficient of variation of 0.35). Assuming that 25% of the children will not satisfy the eligibility criteria, this gives an estimated mean number of eligible children per village of 28.7 with a minimum of 15.

We estimated that the 204 villages will include an average of 28.7 eligible students. In the STRIPES trial the estimated effect was a 0.75 SD increase in mean score: however, effects of smaller magnitude than this would still be important to detect. Conservatively assuming that 60% of the eligible children will take the test at the end of the trial, and an intra-cluster correlation coefficient of 0.23 (as seen in the STRIPES trial [2]), then a trial with 194 villages (i.e. assuming that 5% of the 204 villages will not take part) will give 88% power to detect a difference of 0.25 SD in mean standardised scores between intervention and control villages using a conventional 2-sided statistical significance level of 5% (assuming a coefficient of variation in numbers taking the test by village of 0.35). If the treatment effect is of the order of that seen in the STRIPES trial then there will be reasonable statistical power to explore interactions by ethnicity, gender, wealth and geographic location.

As described above, in the sample size calculation we anticipated that 194 of the 204 villages would be randomised. In fact, 196 were randomised, as 6 villages were removed since they were found to be too close to urban areas to be considered rural, and 2 removed because insufficient eligible children were found. Over 7000 children were enumerated in the randomised villages, with over 6000 children taking the test at the end of follow-up.

Framework

The trial will use a superiority hypothesis testing framework.

Statistical interim analyses and stopping guidance

As no potential harms are anticipated from this intervention, there is no Data Monitoring Committee, interim analyses or stopping rules.

Timing of final analysis

May 2023 to August 2023.

Timing of outcome assessments

The primary outcome (the endline composite mathematics and language score) was assessed through endline tests (EGRA and EGMA) carried out between 24th July and 19th September 2022.

Additional data collection was carried out as follows:

  • Between January and February 2022, a midline test was carried out with the children to assess basic reading and mathematics levels using an ASER-like exam.

  • Between February and April 2022, a midline survey was carried out with the caregivers to record enrolment, reported attendance and educational support during the period that schools were closed.

  • In November and December 2022, a final survey was carried out to record changes in school enrolment and reported attendance, and caregivers’ support to child’s education.

  • Throughout the trial, data on attendance in classes in the intervention arm were collected by Pratham.

Statistical principles

Level of statistical significance

5%

Adjustments for multiplicity

None (not applicable).

Confidence intervals to be reported

Yes, 95% confidence intervals.

Definition of adherence to the intervention and how this is assessed including extent of exposure

Villages did not all run the intervention classes in the same way. There was variability in the number of planned classes per week, the length of these and the size of classes. Also, some children who lived far from classes in their village could not be reached. This was further complicated by COVID-19 when schools were closed and no after-school classes were running. This makes calculation of measures of adherence challenging. For simplicity we will simply use counts of the numbers of classes i) offered to and ii) attended by each child. We also assume that, had the intervention run as planned, then each child would have been offered 360 classes (6 classes a week for 60 weeks, this corresponding approximately to a 17-month period with allowance for holidays etc.). We refer to this as the ideal number of classes.

For the jth child in the ith village we will calculate, over the full follow-up period i) the total number of classes that were offered to that child (\({O}_{ij}\)) and ii) the total number of classes that that child attended (\({A}_{ij}).\)

At child level we will define adherence in three ways.

  1. a)

    Attended as a proportion of ideal (\({A}_{ij}/360\)).

  2. b)

    Offered as a proportion of ideal \(({O}_{ij}/360)\).

  3. c)

    Attended as a proportion of offered (\({A}_{ij}/{O}_{ij}\)).

At village level, using \({N}_{i}\) to denote the number of children in the ith village, we will define adherence in the same three ways.

  1. a)

    Attended as a proportion of ideal \(\left({\sum }_{j}{A}_{ij}\right)/\left(360{N}_{i}\right)\).

  2. b)

    Offered as a proportion of ideal \(\left({\sum }_{j}{O}_{ij}\right)/\left(360{N}_{i}\right)\).

  3. c)

    Attended as a proportion of offered \(\left({\sum }_{j}{A}_{ij}\right)/\left({\sum }_{j}{O}_{ij}\right)\).

Each measure will be summarised using means and standard deviations, and in a contingency table with adherence bands of (0, > 0 to 25%, > 25% to 50%, > 50% to 75%, > 75% to 100%, Table 1).

Table 1 Adherence, intervention arm only

Definition of protocol deviations for the trial

Deviation from the protocol is defined as either 1) an intervention village not receiving any of the intervention during the trial intervention period, or 2) a control village receiving the intervention during the trial intervention period. Such protocol deviations will be listed.

Analysis populations

The primary analysis will follow the intention to treat principle.

For the primary outcome two secondary per-protocol analyses will be performed, one corresponding to each of the “attended as a proportion of ideal” measures of adherence defined above. In each case the per-protocol analysis will be restricted to those with adherence at above 75%.

Trial population

Screening data

The CONSORT Flow diagram summarises the identification, randomisation and reasons for withdrawal of villages and children within the trial. The diagram (shown in Fig. 1) will show numbers of villages approached but not randomised, with reasons listed.

Fig. 1
figure 1

CONSORT flow diagram

Eligibility criteria

A village was potentially eligible if the following conditions were met:

  1. 1.

    Village in Satna district, except villages in the tehsils of: Birsinghpur, Majhgawan and Raghurajnagar;

  2. 2.

    Village population less than 2500;

  3. 3.

    Village has more than 120 children under the age of 6 and at least 15 children eligible for the intervention;

  4. 4.

    Village is accessible by road;

  5. 5.

    Village centre is at least 5 km from a Community Health Centre (CHC);

  6. 6.

    Village centre is at least 3 km from the centre of any other included village.

A child was eligible if he or she was resident in a village within an eligible cluster at the time of enumeration, and fit the following criteria:

  1. 1.

    He or she did not attend first grade or higher in the 2017 – 2018 academic year;

  2. 2.

    He or she was expected to be resident in the village during 2018 – 2019:

  3. 3.

    The child’s caregiver intended to enrol the child in the first grade in the 2018 – 2019 academic year;

  4. 4.

    He or she was born between 16 June 2010 and 15 June 2013;

  5. 5.

    The caregiver consented to allow the child to participate in the trial.

A child was also eligible during the catch-up enumeration (carried out before randomisation) if:

  1. 1.

    He or she was born between 16 June 2010 and 15 June 2013;

  2. 2.

    He or she was enrolled in first grade in the 2018 – 2019 academic year or was planning to enter first grade in the 2019 – 2020 academic year;

  3. 3.

    He or she was expected to be resident in the village during 2019 – 2020;

  4. 4.

    The caregiver consented to allow the child to participate in the trial.

Recruitment Information to be included in the CONSORT flow diagram

This is described in the Trial Population section.

Withdrawal/follow-up

No clusters withdrew from the trial.

Children who have withdrawn will be considered to be those enrolled children whose caregivers subsequently rescinded consent for the child’s participation in the trial.

Loss to follow-up for the primary outcome will be considered to be children who do not attend both endline tests. For secondary outcomes, loss to follow-up will be considered to be children whose caregiver was not interviewed at the endline survey.

Baseline patient characteristics

The following baseline characteristics will be tabulated by treatment arm. No baseline hypothesis tests will be carried out. For categorical variables the overall proportions (with numerators and denominators) will be shown as will the mean and standard deviation of the cluster level proportions. For continuous variables the overall mean and standard deviation will be shown along with the mean and standard deviation of the cluster level means.

Cluster-level variables (Table 2):

  1. a)

    Village size

  2. b)

    Distance to community health center/civic hospital

Table 2 Baseline characteristics of villages

Individual-level variables (Table 3):

  1. a)

    Gender

  2. b)

    Child’s age

  3. c)

    Religion

  4. d)

    Caste

  5. e)

    Primary female caregiver (i.e., mother or other)

  6. f)

    Literacy of female primary caregiver

  7. g)

    Education level of female primary caregiver

  8. h)

    Primary male caregiver (i.e., father or other)

  9. i)

    Literacy of male primary caregiver

  10. j)

    Education level of male primary caregiver

  11. k)

    Parents still alive at baseline

  12. l)

    Wealth index 1. Determined by the material the house is made of: 1. Floor, roof and wall materials all natural, 2. Some, but not all, of floor, roof and wall materials are synthetic, 3. Floor, roof and wall materials all synthetic (as in Eble et al., 2020) [3].

  13. m)

    Wealth index 2. Number of Items (television, radio, motorbike, 4-wheeled vehicle) owned by the household members.

Table 3 Baseline characteristics

Analysis

Outcomes

The primary outcome of the trial is the composite literacy and numeracy test score using the EGRA and EGMA, respectively (Table 4 with subgroup analysis in Table 5). A sensitivity analysis will be carried out omitting the score from EGRA subtask 5b question 1, which was judged to be potentially misleading.

Table 4 EGRA and EGMA test results
Table 5 Composite test scores by subgroup, with interaction tests

Secondary outcomes include the separate scores for literacy and numeracy; caregivers’ engagement on child learning; enrolment in school at the end of follow-up; caregiver’s report of school attendance and the cost effectiveness of the intervention.

Secondary outcomes to be formally tested and a 95% confidence interval constructed are as follows.

  • Mathematics test score, to be calculated as a simple arithmetic mean of the percentage of correct answers on each of the six (some composite) subtasks, evenly weighting each task and not accounting for time remaining. The six subtasks are 1, 2, 3, 4 [mean of 4a and 4b], 5 [mean of 5a and 5b] and 6 (Table 4).

  • Language test score, to be calculated as a simple arithmetic mean of the percentage of correct answers on each of the seven subtasks, evenly weighting each task and not accounting for time remaining. The seven subtasks are 1, 2, 3, 4, 5a, 5b and 6. A sensitivity analysis will be carried out omitting the score from EGRA subtask 5b question 1, which was judged to be potentially misleading (Table 4).

  • Midline test scores (mathematics and language, Table 6).

  • Whether child is enrolled in school at the endline survey (Table 7).

  • Number of hours caregiver spends engaging child in reading or writing activities post lockdown (Table 8).

  • Caregiver’s report of school attendance; number of days of school missed in the past two weeks, conditional on enrollment. As recorded in the endline survey (Table 9).

  • Cost per 0.1 standard deviation improvement in the primary outcome. The standard deviation to be estimated by fitting a linear mixed model with cluster-specific random effects to the primary outcome in the control arm of the trial, with the standard deviation estimated via a summation of the between- and within-cluster variances. The included costs will be all costs for running the intervention and any capital costs will be amortized according to the item. It will include all costs that would occur if the trial intervention were continued without the research costs related to a trial. It does not reflect the costs that a government organization would observe if they took over the intervention. It does not include any costs to families.

Table 6 Midline test results
Table 7 Children enrolled in school
Table 8 Learning support (endline)
Table 9 Reported attendance in school, among those enrolled (endline)

Secondary outcomes to be tabulated but not formally tested

  • Mathematics test score on the combined timed subtasks, to be calculated as a simple arithmetic mean of the fluency measures on each of timed subtasks (Table 4).

  • Language test score on the combined timed subtasks, to be calculated as a simple arithmetic mean of the fluency measures on each of the timed subtasks (Table 4).

  • Mathematics test score on the combined untimed subtasks, to be calculated as a simple arithmetic mean of the percentage of correct answers on each of the subtasks, evenly weighting each task (Table 4).

  • Language test score on the combined untimed subtasks, to be calculated as a simple arithmetic mean of the percentage of correct answers on each of the subtasks, evenly weighting each task (Table 4).

  • Whether child is enrolled in school pre- and post the covid lockdown (midline survey, Table 7).

  • Child’s residence status (Table 10).

    • ◦ Data sources:

      • ▪ Midline

      • ▪ Endline

  • Grade (number 0–5) child is enrolled in during each phase of the trial (Table 11).

    • ◦ Data sources:

      • ▪ Midline pre lockdown:

      • ▪ Midline post lockdown:

      • ▪ Endline:

  • Challenges faced during COVID-19 lockdown (Table 12).

    • ◦ Any challenges faced?

    • ◦ Specific challenges faced:

      • ▪ No smartphone

      • ▪ Limited access to smartphone

      • ▪ Internet connectivity issues

      • ▪ Internet costs too expensive

      • ▪ Electricity Issues

      • ▪ Lack of school teacher support

      • ▪ Lack of time to help child

      • ▪ Low knowledge of technology

      • ▪ Child not interested

      • ▪ No money for a private tutor

Table 10 Children resident in study village
Table 11 School grade of child
Table 12 Covid-19 challenges faced (midline)
  • Learning support provided by family, school teachers, NGOs and/or private tutors during the time when schools were closed (Table 13).

    • ◦ Help at home to study

    • ◦ Educational activities using online videos, recorded classes or games found on educational mobile learning apps/websites

    • ◦ Educational activities using textbooks or worksheets

    • ◦ Source of textbooks/worksheets (schoolteacher, caregiver/family, NGOs, private tutor.

    • ◦ Purchased items by family to specifically support education:

      • ▪ Smart phone

      • ▪ Tablet

        ▪ Computer

Table 13 Learning support (midline)
  • Spending on school materials, school fees and out of school tuition (Table 14)

Table 14 Spending (midline) in Rupees

Analysis methods

In the primary analysis of the primary outcome, child-specific composite test scores at endline will be compared between intervention and control arms using a linear regression model with randomisation arm and the stratification factors (and no other variables) as predictor variables. To take account of the cluster-randomisation, robust standard errors, allowing for the clustering, will be used here and elsewhere. Linear mixed models (with cluster as a random effect) which are also termed hierarchical or multilevel models are commonly used for the analysis of cluster randomised trials. The advantage of an approach using robust standard errors over linear mixed models is that homoscedasticity assumptions are not made.

The adjusted difference in means will be divided by the SD of the test score in the control arm to give a standardised difference, with a nonparametric bootstrap confidence interval (bias corrected and accelerated, 2000 replications at cluster level) computed for this.

Secondary outcomes that are continuous will be analysed using the same approach as above.

Secondary analyses will extend the linear regression model (with robust standard errors that allow for clustering) for the primary outcome described above to (separately) investigate interactions by caste, gender, male and female primary caregiver literacy, village population and wealth.

Secondary outcomes that are dichotomous (such as whether the child was enrolled in school) will be expressed as odds ratios with 95% confidence intervals obtained from a GEE model with a binary outcome, a logit link, and a ‘working’ assumption of independence, with robust standard errors to take account of clustering.

Adjustment for covariates

These are described in the Analysis methods section above.

Methods used for assumptions to be checked for statistical methods

The linear regression models used for the primary analysis assume that residuals are normally distributed. Robust standard errors allow for potential heteroscedasticity according to levels of predictor variables, but do make an assumption of normality conditional on levels of predictor variables. This assumption will be checked by examination of appropriate quantile–quantile plots of standardised residuals. The central limit theorem ensures that results are robust provided that violations of the normality assumptions are not substantial. Minor violations, even if statistically significant, are of little practical consequence. For this reason, formal hypothesis tests of normality assumptions will not be carried out.

Alternative methods to be used if distributional assumptions do not hold

Nonparametric bootstrap confidence intervals (bias corrected and accelerated, 2000 replications at cluster level) will be reported if the normality assumptions are seriously violated.

Sensitivity analyses for each outcome where applicable

In the primary analysis, missing data will not be imputed. In secondary analyses of the primary outcome and key secondary outcomes, multiple imputation by chained equations (MICE) will be used. For analysis of clustered data it is important that the model for imputation includes cluster-specific random effects [7]. Such analyses will be carried out using the Jumo package within the statistical package R [8]. Imputation will be carried out separately in each trial arm. Auxiliary variables to potentially be used will include the randomisation stratification factors, caste, gender, male and female primary caregiver literacy, the wealth indices, the adherence to intervention variables defined above, the midline test scores, enrolment at endline, the number of hours the caregiver spends engaging child in reading or writing activities post lockdown, the caregiver’s report of school attendance, whether or not the child is enrolled in school pre- and post the covid lockdown, school grade at endline, the child’s residence status and the variables quantifying the learning support (and spending) provided by family, school teachers, NGOs and/or private tutors during the time when schools were closed.

If the effect of the intervention is statistically significant, and remains so in the MICE analysis detailed above then the multiple imputation analysis will also be extended to determine the amount of bias over and above that allowed for by the multiple imputation model that would render the primary analysis non- statistically significant.

Subgroup analyses

We will conduct subgroup analyses (Table 5) of the primary outcome by.

  • Gender

  • Wealth index 1 (in three categories determined by the material the house is made of)

  • Wealth index 2 (in five categories determined by the number of relevant items owned by the household, with the interaction tested using a trend test).

  • Caste

  • Primary female caregiver literacy in 3 groups. This to be replaced by female education if more than 10% of the participants have a missing value for literacy and education status is not missing.

  • Primary male caregiver literacy in 3 groups. This to be replaced by male education if more than 10% of the participants have a missing value for literacy and education status is not missing.

  • Village population (above/below median)

For each of the above factors, statistical tests for interaction will be carried out, with claims of different effects in subgroups only made if there is strong evidence (p < 0.01) of an interaction.

Reporting and assumptions/statistical methods to handle missing data (e.g., multiple imputation)

These are described in the Sensitivity analysis section above.

Additional analyses

Additional analysis to be conducted include an economic evaluation calculating total average cost, and total average cost per 0.1 standard deviation improvement in the primary outcome. The standard deviation to be estimated by fitting a linear mixed model with cluster-specific random effects to the primary outcome in the control arm of the trial, with the standard deviation estimated via a summation of the between- and within-cluster variances. The included costs will be all costs for running the intervention and any capital costs will be amortized according to the item. It will include all costs that would occur if the trial intervention were continued without the research costs related to a trial. It does not reflect the costs that a government organization would observe if they took over the intervention. It does not include any costs to families.

Also, as a result of the COVID-19 lockdowns, additional support was provided to enrolled children and their mothers. Summary data relating to this will be tabulated. Data collected included the number of direct messages sent to children and the response rate to these messages, the number of home-visits received, attendance of mothers in fortnightly meetings to encourage engagement, access to and use of books at local libraries and, access to and use of a tablet providing digital learning.

Statistical software

Stata version 17 (StataCorp. 2021. Stata Statistical Software: Release 17. College Station, TX: StataCorp LLC) and/or R (R Core Team 2022. R: A language and environment statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/).

Trial status and declarations

Trial status

The statistical analysis plan is based on the published protocol [4].

This is a cluster randomised trial, with all villages (clusters) randomised in 2019. Eligible children for the STRIPES2 trial were all enrolled prior to randomisation. Endline tests and surveys for STRIPES2 were conducted in 2022. Data cleaning for STRIPES2 is ongoing with possible return to the field for outstanding queries, prior to anticipated data-lock in May 2023.

Data management plan

The final EGRA and EGMA (literacy and numeracy) tests will be double-entered in the main office of the research team in Satna. The database has been developed by Sealed Envelope (https://www.sealedenvelope.com), an independent company contracted to construct and maintain a bespoke database for the trial, who will also keep a periodical backup of the data.

Trial master file, statistical master file and standard operating procedures

The trial master file is part of the standard operating procedures manual. The standard operating procedures manual is available upon request. The statistical master file is held securely and may be available upon request after final analyses.

Availability of data and materials

Data sharing is not applicable to this article (a statistical analysis plan) as no datasets will be generated or analysed during this stage of the study. After publication of the initial results, the anonymised datasets used and/or analysed during the trial with relevant statistical code will be available from the corresponding author on reasonable request.

References

  1. Annual Status of Education Report (Rural) 2018. New Delhi: ASER Centre; 2019. http://www.img.asercentre.org/docs/ASER%202018/Release%20Material/aserreport2018.pdf.

  2. Lakshminarayana R, Eble A, Bhakta P, Frost C, Boone P, Elbourne D, Mann V. The Support to Rural India’s Public Education System (STRIPES) Trial: A Cluster Randomised Controlled Trial of Supplementary Teaching, Learning Material and Material Support. PLoS ONE. 2013;8:e65775. https://doi.org/10.1371/journal.pone.0065775.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Eble A, Frost C, Camara A, Bouy B, Bah M, Sivaraman M, Hsieh J, Jayanty C, Brady T, Gawron P, Vansteelandt S, Boone P, Elbourne D. How much can we remedy very low learning levels in rural parts of low-income countries? Impact and generalizability of a multi-pronged para-teacher intervention from a cluster-randomized trial in The Gambia. J Dev Econ. 2020;148:102539.

    Article  Google Scholar 

  4. Agarwal A, Banerji R, Boone P, Elbourne D, Fazzio I, Frost C, Gopal M, Karnati S, Nair R, Reddy H, Reddy P, Sharma D, Shekhawat SS, Shivalli S. Protocol for a cluster randomised trial in Madhya Pradesh, India: community health promotion and medical provision and impact on neonates (CHAMPION2); and support to rural India’s public education system and impact on numeracy and literacy scores (STRIPES2). Trials. 2020;21:569. https://doi.org/10.1186/s13063-020-04339-6.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Dubeck MM, Gove A. The early grade reading assessment (EGRA): its theoretical foundation, purpose, and limitations. Int J Educ Dev. 2015;40:315–22.

    Article  Google Scholar 

  6. Platas LM, Ketterlin-Gellar L, Brombacher A, Sitabkhan Y. Early Grade Mathematics Assessment (EGMA) Toolkit. NC: RTI International, Research Triangle Park; 2014.

    Google Scholar 

  7. Díaz-Ordaz K, Kenward MG, Cohen A, Coleman CL, Eldridge S. Are missing data adequately handled in cluster randomised trials? A systematic review and guidelines. Clin Trials. 2014;11(5):590–600. https://doi.org/10.1177/1740774514537136. Epub 2014 Jun 5 PMID: 24902924.

    Article  PubMed  Google Scholar 

  8. Quartagno M, Carpenter J. jomo: A package for Multilevel Joint Modelling Multiple Imputation [Internet]. 2020. Available from: https://CRAN.R-project.org/package=jomo

  9. Weijer C, Grimshaw JM, Eccles MP, McRae AD, White A, Brehaut JC, et al. The Ottawa statement on the ethical design and conduct of cluster randomized trials. PLoS Med. 2012;9(11):e1001346.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge the work of A. Jaipal Reddy designing the research activities; all the teams of supervisors and enumerators who arduously mapped and registered women and children; the data entry team who processed all the paper forms; the Sealed Envelope team; Tony Brady and Piotr Gawron for designing the database; Arjun Agarwal and Jitendra Ahirwar for helping develop the content and Nikhil Swaminathan for helping develop the internal measurement systems and processes for STRIPES2. Ketan Verma for helping to develop the midline assessment; Dr Pei-tseng Jenny Hsieh and National Foundation for Educational Research, UK team for helping to develop the endline assessment.

Funding

Effective Intervention NGO.

Effective Intervention, Centre for Economic Performance, London School of Economics, UK. Email: admin@effint.org.

SKe is supported by the Medical Research Council London Intercollegiate Doctoral Training Partnership Studentship (MR/N013638/1).

Author information

Authors and Affiliations

Authors

Contributions

SKe and CF led the development of the first draft with significant contribution from all authors. All authors contributed extensively to the design of the study and have contributed to, commented on and approved the final manuscript. The STRIPES2 intervention was designed by RB, DS, SSh, and colleagues from the Pratham Education Foundation team. SKa and HR provided field and data support for designing the research component. PB designed the economic analysis.

Corresponding author

Correspondence to Chris Frost.

Ethics declarations

Ethics approval and consent to participate

The Ethics Committees of L V PRASAD Eye Institute, Hyderabad, India (LEC 02–16–008) and London School of Hygiene and Tropical Medicine (LSHTM Ethics Ref: 10482) have approved the trial protocol. We have obtained the necessary approvals from the Indian Council of Medical Research, New Delhi and the Government of Madhya Pradesh to conduct this trial in Satna district. The trial complies with the Declaration of Helsinki, local laws, and the International Conference on Harmonisation Good Clinical Practice (ICH-GCP). Any protocol modifications will be communicated to both the Ethics Committees, and consent will be re-obtained at the village and individual (woman or caregiver) level at that point if deemed necessary.

For this trial, we received approval from the Indian Medical Council of Research (ICMR), New Delhi, India. At the state level, approval of the protocol was obtained from the Department of Health & Family Welfare of the government of Madhya Pradesh.

This trial employs multiple tiers of consent: village, individual, and individual on behalf of the child. Agreement to approach eligible villages was first obtained from the Sarpanch. In the trial villages, consent was obtained from the village after the trial has been presented in a meeting with village elders representing all the castes and village residents. Verbal consent was given during a village meeting with written documentation (or thumbprint) of the approval given by the Sarpanch. This process of obtaining consent through meetings with approval of the “guardians” of the clusters is common in trials in which the intervention is delivered at the level of a cluster and it is not possible to obtain informed consent for randomisation from individuals within the cluster before a baseline survey.

Once the trial was accepted at the village meeting, the villages were considered eligible for baseline enumeration. During the process of baseline interview, each head of household, each potentially eligible woman and one parent or caregiver of each potentially eligible child was informed in the local language (Hindi) about the trial and their participation and asked for a signature or thumbprint to indicate their consent to join the trial. Only people who agreed to participate were enumerated. Women and caregivers of enumerated children have the right to withdraw consent at any time during the trial. This process of consent is compatible with current standards for cluster randomised trials [9].

Consent for publication

Participants (household heads, women, and caregivers on behalf of children) were informed that we would revisit the households to interview them about pregnancies, babies, and children’s school enrolment so we could understand to the impact of the CHAMPION2 and STRIPES2 programmes. All participants agreed that all individual information collected during interviews will be used only for research purposes and in ways that will not reveal their identity.

Competing interests

PB is the Executive Chair of EI; IF is a paid employee of EI but has no competing interests. DE and CF received research grants funding from EI but have no competing interests. SKe, NM and SiS are employed in these research grants but have no competing interests. SKa and HR receive research funding from EI but have no competing interests. RB, DS, and SSh declare a potential competing interest due to the involvement of Pratham Education Foundation (an independent organisation), which currently works to improve the quality of education in India. AE has no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Keddie, S., Fazzio, I., Shivalli, S. et al. Statistical analysis plan for a cluster randomised trial in Madhya Pradesh, India: support to rural India’s public education system and impact on numeracy and literacy scores (STRIPES2). Trials 24, 469 (2023). https://doi.org/10.1186/s13063-023-07453-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13063-023-07453-3