Trial design
The trial is a pragmatic cluster (by village) RCT with an additional matched control group. A clustered design was necessary as the program was implemented at a village level and thus individual recruitment was not sensible due to the potential for contamination. This design also allowed for a pragmatic evaluation of how the program is implemented by communities. As the aim is for the ECED initiatives to be sustainable after the period of the loan, it is additionally important to understand how communities implement the program from a process point of view.
As mentioned above, to facilitate the implementation of the ECED project, the 60 project villages per district were allocated to three implementation batches. It was planned that Batch 1 would receive the first block grants at the start of the project, and block grants for Batch 2 and Batch 3 were to follow after 9 and 18 months, respectively. The evaluation design made use of this feature of the implementation: a selection of villages was randomly allocated to either Batch 1 or Batch 3 (within each district).
Recruitment
The project as a whole covers 50 districts across Indonesia. By the time the MoNE expressed support for a randomized evaluation design, many districts had already decided when each of the 60 project villages would receive funding (that is, districts had already allocated villages to a particular batch), making it too late to implement a randomized roll-out of services in these districts. Districts that had not yet decided when beneficiary villages would receive the project were contacted by the MoNE and asked about their willingness to randomly allocate 20 villages out of 60 to Batches 1 and 3. As batch implementation was to begin 9 months apart, it was anticipated that there would be 18 months between Batches 1 and 3. In deciding upon the final ten districts to participate in the study, the MoNE made an effort to ensure geographic diversity with the goal of producing a sample that broadly represented project districts. These districts were: Sarolangun, Rembang, Gorontalo, Kulon Progo, Sidenreng Rappang (Sidrap), Majalengka, Ketapang, North Bengkulu, Middle Lombok and East Lampung. Random selection at the district level would of course have been preferable and allowed for districts to have been representative of the project, but this was impossible due to timing and lack of willingness of districts to randomize.
Randomization of the 20 villages took place through a public lottery in each district, supervised by MoNE staff. The first ten villages selected were allocated to Batch 1, treatment, and the latter ten were allocated to Batch 3, control (which were to receive treatment 18 months later).
Matched control sample
Due to concerns that the randomization and/or planned timelines of implementation would not be adhered to, the research design also incorporated ten matched control villages per district. The matched controls were identified by asking district officials to identify ten control villages that were not part of the ECED project (that would never receive the treatment), but otherwise were similar to the ten randomly selected treatment Batch 1 villages. The rationale for two control groups was practical – to reduce the risk of not being able to identify impact because of unforeseen implementation challenges. It was also technical – 18 months (the anticipated gap between Batch 3 and Batch 1) was thought to be potentially insufficient to detect significantly different child development outcomes, and thus the matched controls were to allow for comparison over 3 years between baseline and endline rather than just the 18 months with the randomized control. One potential implementation challenge, as is common during implementation, was that the project would start late and accelerate as implementation progressed. This would mean that less than 18 months may elapse before controls received the treatment. Thus, we included the matched controls, which aimed to make the study less vulnerable to changes in timing and enabled tracking of longer-term impacts. In sum, the project design was to make use of one treatment group of 100 villages and two control groups, each consisting of 100 villages, thus 300 sample villages in total.
Deviations from original random assignment
In December 2008, before the baseline survey had commenced, but simultaneous with the treatment villages receiving their grant (but not before services began), a World Bank team uncovered that five districts had not complied with the village level randomization. Specifically, this meant that these districts took action to implement the project in an order that was different to what was originally agreed to with the MoNE and the World Bank in 2006. The main reason stated by the districts for their lack of compliance to the randomization was convenience – they felt it was easier to roll-out the project to villages that were proximate rather than complying with the randomization design which did not allow for clustering project sites. The districts that did not comply are shown in the participant flow diagram (Figure 1) with the number of villages that remained in the original batch to which they were assigned. Note that full compliance would mean ten villages in each group and no changes.
This alteration of the implementation plan is of course a serious concern for the evaluation as it compromises the design and in turn the quality of the information that the study will be able to generate. The impact evaluation team decided to drop the survey work in the district (Gorontalo) where noncompliance was the most serious. The rationale for this change was that external validity was already compromised due to the fact that the ten districts chosen for the evaluation were not randomly chosen from the 50 project districts, but were chosen because they agreed to participate in the evaluation. The team also felt that with such poor compliance in Gorontalo there would be little possibility of exploiting the randomized design in the analyses.
In the district of Middle Lombok , all 60 project villages (not just 20 as in the other nine districts) were randomized to the staggered rollout. This was the independent choice of the district, not the study team. Therefore, the sample of 30 villages that were originally allocated to Gorontalo were replaced by villages in East Lombok and all 60 project villages in East Lombok were included in the study sample, thus increasing the total sample size by ten villages, to 310. The rationale for including all 60 villages was that it would offer more power. Additionally the inclusion of Batch 2 villages may provide some information about different periods of exposure to treatment.
Additionally, due to compliance issues, some of the villages selected as matched controls in 2006 were changed at the time of baseline in 2008. Due to the matching strategy, which involved pairing the treatment villages with individual similar villages, if treatment (Batch 1) villages changed, we also needed to newly identify the most appropriate matched control.
Sample size calculation
The average treatment effects at the village level, using a cluster randomized design, drove the sample size calculations for this experiment. It was challenging to estimate baseline variations between and across clusters for the outcome measures and to set minimum detectable effect sizes as a project of this kind was unprecedented in Indonesia, and none of the intended outcome measures had ever been collected in the country. Thus, the research team used a combination of education measures in Indonesia and ECED measures from a similar project in a country with comparable socioeconomic indicators, the Philippines.
In order to estimate changes in comparable outcome measures, the team calculated the intracluster correlation using net enrolment ratios in junior secondary education as observed in the 2004 Indonesian national household survey (Susenas). We used secondary rather than primary enrolment due to Indonesia’s near universal enrolment at the primary level. In general, one enumeration area in Susenas is a village, and in each area, 16 households are interviewed. The intracluster variation between enumeration areas for junior secondary enrollment was estimated at 0.23, which was used in the subsequent power calculations.
To predict our effect sizes for the sample calculations, we used results from a study conducted in the Philippines on the impact of a comparable, community-based ECED project [31]. The study evaluated an ECED initiative of the Philippine Government using longitudinal data collected over 3 years on a cohort of 6,693 children aged 0 to 4 years at baseline. The Philippine ECED program was especially relevant for this study as the Indonesian program was modeled on the Philippine example. The Philippine ECED program included a range of health, nutrition, early education and social service programs, and aimed to improve the survival and developmental potential of those children who were most disadvantaged and vulnerable. The Philippine ECED project did not introduce new services, but provided a child development worker to each of the program areas to link center-based and home-based services in an integrated multi-sector approach [32].
The Philippines study used outcome measures relating to gross motor skills, fine motor skills, expressive language, receptive language, cognitive skills, self-help and social-emotional development, and reported mostly positive impacts in the range of 0.3 to 1.1 standard deviations [32]. Using these impacts as the minimum detectable effect sizes for the Indonesian study, the minimum number of villages to sample was in the range of 7 to 116 villages, depending on the chosen outcome measure, with half of this sample to be allocated to the treatment group and half to the control. We calculated the minimum number of clusters on the basis of a power of 0.9, with an estimated intracluster correlation coefficient (ICC) of 0.23 and with the requirement of ten children surveyed per village. The Philippines study, however, reported unusually high effect sizes especially when compared to a review (published after the sampling for this study) by Nores and Barnett [33], which suggested an average effect size of 0.255 standard deviations for studies investigating the impact of cash transfers, nutritional, educational or mixed interventions conducted in low and middle income countries. Using this as the minimum detectable effect, keeping all else constant, the minimum number of villages required increases to 182. The study eventually sampled 200 villages randomized into treatment and control groups. The power calculations suggest that this would be more than sufficient to detect impact estimates in similar size to those that are reported for the Philippine ECED intervention, and just sufficient to detect the average effects reported in Nores and Barnett’s review.
The size of the matched control group was set equal to the treatment group. In the analysis, we will also be able to present estimates of the impact that rely on standard difference in difference assumptions, comparing the treatment group with the matched control group. The sample size calculations as discussed above are valid under these assumptions; hence the same sample size for the matched control group.
Sampling procedures
The steps taken to obtain the final list of sampled villages are described in the Recruitment section above. The baseline survey contained 310 villages, of which 100 were originally allocated to Batch 1, 20 originally allocated to Batch 2, 100 originally allocated to Batch 3 and 90 allocated to the matched control group. The remainder of this section describes the sampling procedures followed within each village.
In each village, two dusuns were sampled. At baseline it was not clear which dusuns the project would work in and as such the evaluation selected one dusun randomly as well as including the dusun in which the village head’s office was located. The rationale for the selection of the dusun with the village head’s office was that it would have a higher concentration of people and also be a likely location for early childhood services. We felt confident about detecting project participation without yet knowing about project placement as the intervention aimed to have an influence at the village level. On average there are five dusuns per village [28].
Households were randomly selected during the baseline survey from a listing that was created for this evaluation. As individual level enumeration data is not available for Indonesia, the World Bank had to construct a listing. To construct this listing, fieldworkers went to each village, met the village and dusun heads, and asked for a resident roster list of all households by dusun where there was a child aged between 0 and 6 years. In some areas, if the information from the dusun head was incomplete or questionable, this information was cross-checked with records from the posyandu head (integrated child health services clinic) and/or a midwife. In each dusun, survey teams interviewed ten households with children aged 1 and/or 4 years, amounting to 20 households per village and thus 6,200 households in total.
In the case where a sampled dusun had an insufficient number of children in the target ages, field teams randomly selected another dusun, and continued this random selection process until meeting the required 20 households per village. If all the households with children in the target age range as per the listing had been contacted across the entire village, but the survey team still had not interviewed 20 households, then supervisors could ask people in the village if there were other children in the village that met the age criteria (that is, snowballing), and interview them.
The objective of sampling households with children aged 1 and/or 4 years in each village was to follow children who would span the ages of 1 to 7 years during the life of the survey, capturing most of the target age range of the project. Children who were aged 1 year at baseline will have reached the age of 4 years at endline, thus providing an additional comparison with the cohort of children aged 4 years old at baseline. The cohort analysis of the children aged 1 year will provide information on the effectiveness of ECED interventions focused at very early childhood, mostly related to health (nutrition, vaccinations, fine motor skills and gross motor skills) and parenting. Children who were aged 4 years old at baseline will be around 7 years old at endline, and will most likely be enrolled in the early years of primary school. At this point, this cohort will have reached an age where we can assess school performance, and shed light on the project’s impact on increasing school readiness. The analysis based on the older age cohort will provide information on the effectiveness of the ECED interventions aimed at children in the age range of 4 to 6 years old.
Midline replacement procedures
Since this study aimed to track the same children over three rounds of data collection, the study team made every attempt to interview the baseline children again at midline in 2010. Across all nine districts, 112 children were not reached in the midline. The most common reasons for a child not being available for midline interview were: the household was impossible to find (17 children), the child had died (nine children), the family refused (five children) or other reasons (81 children).
In the case of children moving, field teams adhered to the following protocol. First, the teams tried to assess whether the child had moved to another village (in the same district) that was in the study sample. If the child had, the child and household were interviewed in the new village. One hundred and one of the children had moved to another village prior to midline, but were able to be identified and interviewed in their new village. Second, if tracking a child to a sample village was impossible, the household was replaced using the same identification procedure from the baseline. The field teams used the dusun level listing from the two sample dusuns, and if this listing did not yield 20 households per village, then the field teams used the procedure outlined above for randomly selecting a new dusun. While replacements were made, the reason for them was not always recorded, and the exact number of children who had moved to another village and were no longer able to be tracked is unknown.
Other reasons for replacing or adding new households included: i) baseline teams had erred in interviewing children in the wrong age group in baseline; or ii) baseline teams were not able to identify 20 households with children who met the age criteria, and thus teams in the midline attempted to search again and add households. While tracking children outside of a sample area might have been possible, the research team decided against it for reasons of cost and questionable benefit to the evaluation. All children who were lost to follow-up at midline were replaced. The participant flow diagram (Figure 1) provides details regarding the number of children lost to follow-up in each study district.
Outcome measures
As mentioned above, the primary objectives of this study are to establish the impact of the ECED project on early childhood development outcomes. The results are also to inform the pathways of impact and obtain greater insight into the patterns and correlates of child development in Indonesia. In order to understand the pathways not just at a household, but also a community level, the evaluation makes use of six questionnaires covering the state of children’s development and the context in which children develop. At baseline, there were six separate questionnaires: a child questionnaire that included child tasks, caregiver questionnaire, household head questionnaire, village midwife questionnaire, posyandu (integrated child health services clinic) questionnaire and a village head questionnaire. The midwife and posyandu were included for the baseline data collection as these are publicly provided health services targeted at young children and they are often the most common form of ECED services in a village. For the midline data collection, the midwife and posyandu questionnaires were replaced with a dusun level questionnaire related to project services (in treatment areas), and a village level questionnaire related to ECED services (in treatment and control areas). The midwife and posyandu questionnaires were dropped mainly for cost reasons. While the information from these service providers was valuable, the research team found it more beneficial to gather data on project implementation and more comprehensive information on ECED service availability and provision, and it was not feasible to have four village level questionnaires in the midline data collection.
Instrument development and piloting
Access to services, service utilization, asset inventory, dwelling attributes and consumption history by the household were collected using a series of questions that were previously constructed for the Indonesian Family Life Survey (IFLS), a renowned panel study focusing on demographic, poverty and health measures. The child and caregiver questionnaires include several internationally-recognized standard scale instruments and anthropometric measures. We also collected information about the child’s development and health history from the parent/caregiver. None of the standard tools used to measure childhood development had been applied in Indonesia before. As such, the instruments had to be translated and adapted to the Indonesian culture and circumstances, and the study design meant that some of the standard measures of child development needed to be amended.
Early Development Instrument (EDI)
The Early Development Instrument (EDI) is generally completed for each child in the kindergarten classroom by the child’s teacher. In Indonesia, very few children attend school at the age of 4 years, particularly in poor regions, and thus the first variation was that the EDI was to be completed by an interviewer with the caregiver of the child. The full EDI includes a core checklist of 104 items, combining five major areas of child development and 16 sub-domains. However, the EDI also comes in a short form with 48 items. Due to the number and length of instruments being used in the impact evaluation and concern for respondent burden, it was decided to use the short form of the EDI from inception. It is acknowledged that the author of the EDI does not recommend the use of the short form EDI prior to analyses of all the items by piloting the full EDI in a country first [34].
Strengths and Difficulties Questionnaire (SDQ)
The Strengths and Difficulties Questionnaire (SDQ) [35], which is also an informant-based assessment of a child, is also used internationally. The SDQ had already been translated to Indonesian by Wiguna and Hestyanti [36]; however, no validation data for the translated version is published.
Dimensional Change Card Sort (DCCS)
As a test of executive cognitive functioning, we included the Dimensional Change Card Sort (DCCS) task [37, 38]. This task involves a child sorting cards with pictures that vary by shape and color into two groups according to one dimension (by shape), and then switching to sorting them by the other dimension (by color). Traditionally the colors are red and blue and the shapes represent a rabbit and a boat. For the purposes of the ECED impact evaluation we changed the shapes to a motorbike and a cat as these are more familiar to children in Indonesia.
Child tasks
Various tasks similar to those used in the Ages and Stages Questionnaire (ASQ) [39] and an instrument used in the Philippines for another World Bank project [31] were also included in the ECED impact evaluation. These included fine motor skills tasks (threading string through a bead), gross motor skills (sitting through to walking backwards, hopping on one foot and throwing a ball overhead), cognitive assessment (by drawing a circle, a person and a house), and expressive and receptive language (by identifying pictures and parts of their body).
Content validity and piloting of the instruments/questionnaires
Content validity was established over a period of 4 years from 2006 to 2009 involving meetings, focus groups and various pilots in the field. Initial meetings were held with staff from the Government of Indonesia who had expertise in early child development. Translators were present at these initial meetings. No modifications to the instruments were made at this stage other than the inclusion of a set of questions regarding the child’s knowledge of Muslim prayers and religious practices. This version of the instrument was piloted in a mix of poor urban and rural villages near Majalengka (West Java) in March 2007, gaining a total of eight completed questionnaires.
The pilot revealed difficulties in the translation of response ranges (that is, ‘sometimes agree’ through to ‘sometimes disagree’) with a distinct preference for simple yes or no answers. The interviewers were government staff from the Department of Early Childhood (PAUD) within the MoNE and one experienced fieldworker who had worked on the IFLS and other significant surveys in Indonesia. There was much debate among the interviewers regarding the five ordered response categories and if it was possible to generate a range from the respondents or not, with the expected response in the Indonesian language being ‘can’ (bisa) or ‘cannot yet’ (tidak bisa). In addition, the interviewers were concerned about a strong social desirability response bias displayed by the child’s caregivers. After this pilot testing, the study team added a series of child tasks, fieldworker observations and repeat questions within the instrument suite to try to counter the social response bias and increase our ability to assess reliability. Further translations and back-translations were undertaken to improve the understanding/interpretability of the ranged response.
In September 2007, this amended full suite of instrumentation was piloted in Subang (Central Java) with 26 households. Similarly to the previous pilot, there was still difficulty with the ranged response categories. In addition, questions that required some degree of subjectivity in the answer seemed to be very difficult for the respondents (for example, ‘How would you rate your child’s overall social/emotional development?’). Such questions required greater guidelines and training for the interviewers.
Baseline pilot study
After another series of modifications, the instruments were again back-translated to English for checking before a final and comprehensive pilot. This baseline pilot was conducted in January 2009 in Yogyakarta and Kebumen in Central Java by the survey firm, Bina Karya, which was hired by the MoNE to carry out the baseline survey. This pilot surveyed 200 households (children, caregivers and household heads), 20 midwives or posyandu (integrated child health services clinic) heads and ten village heads. The purpose of this pilot was primarily to validate the instruments, especially for the children, to ensure that the instruments captured a range of child capabilities and did not pose ceiling or floor effects. Secondly, the pilot was to test the comprehension of survey questions by respondents and the overall flow of the questionnaire, and serve as an opportunity for Bina Karya to practice the logistics of implementing the survey.
Quality control
Project quality control
Frequent project coordination meetings with government were and still are being conducted by the World Bank as standard practice with any loan to a country. In addition, the World Bank’s project management team based in the Indonesian office generally organizes two annual formal review sessions, each producing an aide-mémoire which documents the project’s accomplishments, challenges and proposed remedies. The project management team is in daily contact with the MoNE and makes frequent visits to oversee project implementation.
Baseline survey quality control
Fieldwork training took place in March 2009 over 5 days in three locations throughout the country: Yogakarta (Central Java), Lampung (Sumatra) and Mataram (Lombok). Training covered the details of the fieldwork manual including protocols for obtaining consent, ensuring confidentiality, how to probe effectively and questionnaire content. The training also included a module for data entry staff to become familiar with the data entry program (the manual and program was developed by World Bank staff), and a field practice with live respondents.
Field teams used paper questionnaires but data were entered shortly after enumeration as data editors were co-located with enumerators during fieldwork. Baseline fieldwork took place from March to June 2009. Each field team consisted of five to six fieldworkers and a team supervisor. It was the supervisor’s responsibility to: gain support/permit from the village head, conduct the village level instruments (village head, midwife and posyandu), check all surveys for completeness and quality before leaving the village, and keep track of participation.
Baseline survey quality control was carried out by a combination of staff from the survey firm’s leadership team, the MoNE and the World Bank. Quality control during survey implementation included attending the pilot and training, frequent visits to the field to accompany supervisors and enumerators during questionnaire administration, and observing the process of data editing and entry. Survey firm supervisors and several MoNE and World Bank staff also revisited households to verify data collected by enumerators. Subsequent to fieldwork, MoNE and World Bank staff were involved in observing the second data entry and cleaning process which took place at Bina Karya offices in Jakarta. Throughout the entire survey effort, MoNE and World Bank staff provided feedback and recommendations to Bina Karya staff regarding measures to improve fieldwork management, interviewing quality, data entry processes, and clarifying instrument content.
Despite these quality checks and processes there were still problems with data quality. As an example, it was found that one fieldwork team entirely made up their data. Once this was discovered a new fieldwork team went back out to those villages and the data were collected properly.
Midline survey quality control
Many of the same procedures and protocols for the baseline were repeated for the midline data collection, with some exceptions. First, some questionnaire content was altered to reflect the baseline experience, for example, questions that were found to be ineffective, showing little variation or proving not to be understood by respondents were revised or dropped. The team also added a different suite of questionnaires at the village and dusun levels related to project implementation and overall ECED service provision and availability. Second, the MoNE implemented the survey differently. The MoNE recruited a team of five highly experienced survey consultants to lead the survey, along with an event organizing firm, which handled all administration and logistics of the survey, but not survey content or quality. The lead consultants were responsible for recruiting all fieldworkers and delivering high quality data.
Because of the changes to questionnaire content and implementing structure, two pilots were conducted prior to midline survey. The first pilot was conducted in Lombok in April and May 2010, and the second in Subang (West Java) in June 2010. The purpose of the pilot was similar to that of the baseline pilot, with a focus on the new questionnaires about ECED service provision and project implementation. The team conducted two pilots because the objective of the first was to test questionnaire content and served to refine the questionnaires, whereas the second pilot was to practice logistics of survey administration and get final estimates on survey administration time.
Midline survey training began with a 2-day workshop for district coordinators in Jakarta, followed by 7 days of fieldworker training in Bandung in late June/early July (thus all fieldworkers were trained together). Midline data collection took place in July and August 2010. Midline data collection quality control was largely similar to the baseline with survey team leadership, MoNE and World Bank staff making frequent field visits throughout the duration of the fieldwork to verify data collected and entered, accompany fieldworkers during interviews, and offer coaching and feedback regarding questionnaire administration and content. As with the baseline, a first round of data entry took place in the field with the second round of data entry in Jakarta.
Planned analyses
As noted above, there are several hypotheses under the trial. One of the risks when collecting a large set of outcome variables is to highlight false positives in the final analysis, that is, significant coefficients are identified in the final analysis when, in fact, these could be due to natural randomness. To reduce this risk, for each hypothesis below we list (in bullets) which outcome variables we will use to test our research hypotheses. We believe that by limiting the analyses to a parsimonious set of variables defined ex ante, the risk of false positives is greatly reduced. The hypotheses under the trial are whether the ECED program has led to:
-
1)
Greater access to ECED services
-
2)
Higher participation rate in ECED services
-
3)
Higher enrolment rate in school at earlier ages
-
4)
More ‘school ready’ children
-
The summary values of the EDI in the domains of physical health and well-being, social competence, emotional maturity, language and cognitive skills, communication skills, and general knowledge
-
Mean score on the SDQ in the areas of emotional symptoms, conduct problems, hyperactivity/inattention, peer relationship problems and pro-social behavior
-
The average numbers of stages children pass on the DCCS task (there are three stages)
-
The fraction of tasks children can perform in the areas of gross motor skills, fine motor skills, language skills, cognitive skills and socio-emotional skills
-
5)
Higher community awareness about the importance of ECED
-
6)
Higher persistent breastfeeding rates, improved nutrition and improved early childhood stimulation.
-
Average number of months of exclusive breastfeeding for children when they were below the age of 6 months
-
Height-for-age z-score, weight-for-height z-score
-
Number of activities children engaged in last week (out of book reading, storytelling, drawing, music, playing with toys and helping in household chores) and fraction of children that play outside
The first steps in the analysis will be to further validate the instruments used. This will be undertaken by assessing floor and ceiling effects, correlates between measurements, and investigating gradients with socio-economic indicators as well as with age and gender. These analyses will also provide a unique snapshot of the early development of a large cohort of Indonesian children [40].
Having two control groups, an experimental and a matched control, is an unusual feature of this evaluation. Usually an experimental design would be the first choice, and the entire sample would be allocated to this. This rule of thumb was not followed for this evaluation because the effect of the intervention is expected to depend on the duration of exposure to the project. The largest measurable impact is thus expected for children that live in Batch 1 villages at the end of the project. Measuring this impact is the main objective of the evaluation. The randomized control sample is of limited value in estimating this impact as it will have already received the intervention by the end of the project. The matched control sample, on the other hand, will never receive the intervention. Using the observations from the Batch 1 villages and matched control villages, the impact of the project can be estimated using:
(1)
Where i denotes the child, j the village, and t = 1,2 indicates the time of the baseline, midline and endline surveys. X is a vector of village and child baseline characteristics, and T indicates whether the village received the project (T = 1 for Batch 1 villages and T = 0 for matched control villages).
This estimate is a valid impact estimate under the assumption that the outcomes at endline, given baseline characteristics, would be equally distributed in the Batch 1 villages and matched control villages in the absence of the intervention. This assumption cannot be tested at baseline but can be tested at midline. At midline, the averages observed in the randomized control sample of Batch 3 villages are a valid estimate of the expected outcomes in Batch 1 villages in the absence of the intervention. Whether these estimates are equal to those of the matched control sample can be tested by estimating:
(2)
An estimate of π that is not significantly different indicates that the matched control sample would have been a valid control group at midline when using the sample of matched control (MC = 1) and randomized control villages (MC = 0). If this is true, it also increases the ‘comfort factor’ of making this assumption at endline. If not, it provides an indication of the direction in which the estimates from (1) will be biased.
At midline, the experimental design makes it possible to estimate the average treatment effect of the first 18 months of intervention with minimal assumptions. Batch 3 villages will still be a valid control group. Using data from Batch 1 (T = 1) and Batch 3 (T = 0), the average treatment effect can be estimated by:
(3)
Outcomes will be indicators of child development and utilization of early childhood development services. The latter is of interest as the projects aims to increase child development through improved access to early childhood development services. Considering both types of outcome variables provides an opportunity to evaluate the pathways of impact.
Differential treatment effects
We will report results for the following sub-groups: boys/girls, poor/rich (defined on the basis of an asset index with poor being defined as those who are in the bottom half) and enrolment in ECED services before commencement of the project. All sub-groups will be defined on the basis of information collected in the baseline data.
Deviations from planned design and consequences for analysis
As mentioned earlier, one deviation from the research design was that the randomized assignment to Batches 1 and 3 was not adhered to. This happened in our case for 20 out of 310 villages. Because this misallocation occurred in relatively few villages, these can be accommodated by using an instrumental variable (IV) estimator in (3). In this case, T
j
in (3) is replaced with the actual batch, or the time period of the actual intervention at the time of the survey, and is instrumented with T
j
. A comparison between the non-instrumented estimate and the IV estimate is insightful in this case. To estimate the impact of the entire project at endline using (1), there is only the non-experimental estimator available. The deviation between the non-instrumented estimator and the IV estimate will give an indication of the effect of using a non-experimental estimator on the parameter estimate.
Another significant deviation from the research design was that the baseline data collection did not take place before the Batch 1 villages started implementing the project. The baseline data collection took place on average 6 months after Batch 1 villages started implementation. Further, the midline data collection took place an average of 9 months after Batch 3 villages started implementation. The delays were a result of the MoNE’s prolonged procurement process to engage the survey fieldwork company and a desire to dispense the funds to the communities. Given these timing issues, in order to assess the impact of the project we will construct measures of dosage, such as months the project was active in a given village.
The consequence of the late baseline survey is that the pre-intervention values cannot be observed from the baseline data for Batch 1 villages. For Batch 3 villages, the intervention had not occurred at the time of the baseline. This is particularly problematic for pre-intervention levels of outcome variables, which in equations (1), (2) and (3) would normally be included in X
ij 0. With a late baseline, these outcome variables could already have been influenced by the intervention, thus leading to correlation between X
ij 0 and T
j
, also in the experimental design.
Our approach to manage this issue is two-fold. For the utilization variables, which we believe responded quickly to the opening of early childhood development services, we intend to reconstruct a variable of utilization before the start of the project based on retrospective data collected in the baseline survey. For child development variables, we intend to use the baseline data to estimate the impact of the intervention after 6 months comparing Batch 1 to Batch 3 villages.
The consequence of the late midline survey is more limited. When conducting the midline analysis in (3), the comparison is no longer between 18 months of implementation and no project, but rather between 20 and 9 months of implementation. This is because project implementation was delayed in Batch 3. By the time the midline data were collected, Batch 3 villages had been implementing the intervention for only 9 months. Therefore, we intend to use the midline data to estimate the impact of the project after Batch 1 villages have received the project for 20 months and Batch 3 villages have received the project for 9 months, that is, the impact of 11 months of differential exposure. For outcome variables which respond quickly to the treatment, such as utilization, this is a problem as both Batch 1 and Batch 3 villages received treatment. Our strategy will be to focus on utilization variables which capture utilization over the entire evaluation period, not only at the time of the survey. For outcome variables which respond more slowly to the treatment, such as child development outcomes, we simply report the comparison as above, indicating that this is the impact of the 18-month difference in treatment duration.