The analysis of the primary outcome will be performed after 3 years of follow-up. The secondary outcomes will be evaluated both after 1 and 3 years of follow-up. Analyses will be performed by the investigators of the iHealth-T2D study group (MM for the 1-year analyses), who were blinded for the intervention, on a clean anonymised data set (JCC and AK). The latest version of the R statistical software package will be used. Tests will be two-sided, and p-values < 0.05 will be considered statistically significant. All statistical analyses will be adjusted for confounders registered at baseline, namely age and sex. We will not adjust for multiple testing as we pre-defined the primary and secondary outcomes . Data will be reported in line with the Consolidated Standards of Reporting Trials (CONSORT) 2010 statement: extension to cluster randomised trials .
The analyses will be performed according to the intention-to-treat (ITT) principle with data from all clusters and participants enrolled in the study [17, 18]. Data of participants who attended at least one post-baseline assessment will be analysed according to their initially assigned study arm, regardless of their adherence. Participants who withdrew their consent will be excluded from the ITT analyses, and the number of participants who withdrew their consent and reasons for withdrawing consent will be reported. The patterns of missing data for the primary and secondary outcomes and, if known, reasons for missingness will be summarised for both treatment arms . The nature and pattern of missing data will be explored. If data missing at random is assumed , data will be imputed by multiple imputation methods and primary analysis will be performed with the imputed data. If data is not missing at random, a “best case/worst case” sensitivity analysis will be used . In case multiple imputation is used for one or more outcomes, we will use the jomo wrapper in the mice package in the R software for statistical analyses; variables selected as predictors for imputation contain known predictors of T2D. These include covariates included in the main analysis model (sex, age) and the auxiliary variables country, setting, socio-economic status, pack-years of smoking, alcohol consumption, metabolic equivalents (METs) of physical activity, waist circumference and HbA1c. Missing values will be imputed separately by the allocated randomisation group  and will comply with the multi-level character of the data.
Baseline characteristics of both study arms will be presented by sex and country and presented in a table. The baseline characteristics will not be tested for statistical differences between study arms . The baseline characteristics will be reported by arithmetic means and standard deviation (normally distributed numerical data), medians and interquartile ranges (non-normally distributed numerical data) or percentages and numbers (categorical data). Normality of data distributions will be inspected visually by plotting histograms, and we will assess the deviation from normality by the Shapiro–Wilk test. In case of a p-value > 0.05, the data will be transformed for normality, before any statistical analyses.
Descriptive characteristics to report at baseline include age (years), setting (%), socio-economic status (%), smoking (pack-years), alcohol consumption (units/week), physical activity (MET/week), BMI (kg/m2), waist circumference (cm), HbA1c (%) and glucose (mmol/L). The population size and number of missing observations will also be reported.
Analyses of the primary outcome
Cumulative incidence of T2D will be summarised and compared between the treatment arms using random effects logistic regression to estimate the odds ratios (OR) and 95% CI. The R package lme4 for generalised linear mixed models will be used, which includes the frequentist method to estimate the fixed and random effects; the default correlation structure is unstructured . In addition, we will evaluate the intraclass correlation coefficients to assess the cluster variance; the R package sj stats, version 0.17.5, will be used. The model will include the randomisation stratum site as a random effect and treatment and country as a fixed effect. The effectiveness of the lifestyle intervention will be reported by the risk difference and the screening numbers needed to identify one case of “high risk” for developing diabetes and the number needed to treat or delay one case of T2D. The risk difference will be derived with the modified log-Poisson approach . The number needed to treat will be derived by the R package nnt, which is based on the restricted mean survival time in the control group divided by the difference in restricted mean survival time between the treatment and control groups up to 3 years of follow-up. The Wilson score method will be used to calculate CIs . All analyses will be adjusted for the confounders age and sex.
Analyses of the secondary outcomes
The secondary outcomes are of continuous nature and will be reported as mean and SD in each of the two treatment groups. The differences between the two treatment arms will be estimated with a multilevel linear mixed-effects regression model. The models will include the stratification variable country as a fixed effect and a random effect for clusters. The estimates will be presented with their associated 95% confidence intervals (CIs) and p-values for comparison between the treatment groups. In addition, adjustments for age and sex will be performed, and the baseline values will be reported.
Treatment compliance will be reported for the intervention arm as an explanatory variable. It will be reported according to the number of times a participant turned up for the lifestyle modification (LSM) sessions. In addition, changes in dietary intake will be reported. Twenty-four-hour dietary recalls and food frequency questionnaires were performed in the treatment arm only. Dietary variables will, therefore, be reported as a change from baseline in the treatment arm.
Both absolute and relative risk reduction will be compared for subgroups of participants included in the study based upon a high risk for T2D according to waist circumference measurements (waist circumference ≥ 100 cm in India and Pakistan; ≥ 90 cm in Sri Lanka) and those included based upon HbA1c levels (6.0–6.4% inclusive). The interaction of the treatment arm with sex, setting, socio-economic status, baseline waist circumference and HbA1c levels will be assessed. If there is an interaction, effect estimates and p-values will be presented by subgroups.
Sensitivity analyses will be performed to identify potentially extreme sites, because the extreme deviation of one site from other sites may have a large impact on the overall results. This is done by leaving one site out at a time; a centre is considered extreme as the estimate changes by > 10%. In addition, complete case analyses will be conducted to assess the robustness of the results.
Since lifestyle interventions are generally considered to be safe, no (serious) adverse events are to be expected. In case of any adverse events, these will be reported per incident with the number per group and a description of the event.