PICFs used between 2015 and 2020 for human interventional studies were obtained from a convenience sample of research organisations (32 were contacted with 21 providing PICFs) and from the Australian and New Zealand Clinical Trials Register. PICFs written for self-consenters or for proxy-consenters (parents/legal representatives) from all therapeutic areas and sponsor types were included. PICFs written for children or participants with learning disabilities, PICFs from non-interventional studies, and PICFs written in a language other than English were excluded. To maximise generalisability, PICFs were obtained from a convenience sample of organisations located in all Australian states and territories, including coordinating centres, industry sponsors, public and private hospitals, medical research institutes, trial networks, and research groups. To minimise selection bias, random samples of up to 25 PICFs were requested from research offices in large universities or teaching hospitals that typically host well over 25 interventional studies per year. These organisations were asked to select PICFs from their database using an online random number generator and a link to an online generator was provided. Organisations with fewer than 25 studies, typically trial networks, trial units, and individual research teams, were asked to provide all available PICFs.
A total of 289 were collected and coded by PICF type (self-consent versus proxy consent) and sponsor type (commercial versus non-commercial), study characteristics, illustrations, tables, and other elements related to document format, layout, and language use. Duplicates and ineligible PICFs were removed, leaving 278 PICFs. As we received a higher-than-anticipated response from non-commercial oncology networks/units, our sample of oncology PICFs substantially overrepresented oncology trial activity in Australia. Therefore, a random sample generator was used to select 30 non-commercial oncology PICFs for removal. A total of 248 PICFs were thus included in the analysis.
Consent forms were removed before the page, and word counts were recorded. Documents were then prepared for the calculation of readability scores. The online program ‘ReadablePro’ (formally ‘Readability-Score’) was used to calculate the readability scores [20]. PICFs were prepared in accordance with the program’s guidelines, including the removal of titles, headings, bulleted lists, tables, and any full stops embedded in the sentences.
Readability formulae are a widely accepted method for assessing the average comprehension of a text by an average reader [21]. For our analysis, we selected three well-established formulae. The primary outcome measure was the Flesch Reading Ease score [22], a continuous variable with potential scores of 0–100, where a higher score indicates easier readability. A score of between 70 and 80 is equivalent to a grade 8 reading level.
The secondary analysis was based on the Flesch-Kincaid Grade Level [23] and the Simple Measure of Gobbledygook (SMOG) [24] for the total sample and then for each group, with comparison. These measures estimate the years of education a person needs to understand a piece of writing. The Flesch formulae calculate the scores based on word and sentence length and are built into most word processing programs. The SMOG score is derived from the proportion of words with 3 or more syllables.
Regarding the rationale for using these formulae, the Flesch formulae are widely recommended in government health literacy and plain language guidance and built into word processing programs. The SMOG formula is also easily accessible and the most suitable for assessing health literature [25,26,27] unlike other formulae which test for 50–75% comprehension, SMOG tests for 100% comprehension. This is considered important for documents informing healthcare decisions, as these documents are not intended to be skim-read [27]. Consequently, SMOG tends to produce scores that are 1–2 grades higher than the Flesch formulae.
Both Flesch-Kincaid Grade Level and SMOG have a significant correlation with expert ratings of readability conducted by health literacy experts [28]. However, readability scores have their limitations as they do not measure factors such as cohesion between sentences, typography, and word choice [25, 28]. To extend our readability analysis, ten additional best practice measures for writing for consumers were selected from the NHMRC PICF guidance [1] and Australian Commission on Safety and Quality in Health Care (ACSQH) guidance [29, 30] and analysed each PICF for their presence. Although not research-specific, the ACSQH guidance documents were considered relevant as Australian hospital accreditation against ACSQH Standards now extends to its clinical trial activity. Three measures (words per sentence, sentences per paragraph, and the use of passive voice) were calculated by the ReadablePro program. To provide an objective measure for ‘word choice’, we selected seven complex words (listed in Table 3) where simpler alternatives are recommended in government guidance [31] and searched each PICF for inclusion of at least one of these words. These words were selected based on the likelihood that either they, or a simpler alternative, would be present in PICFs. For example, a PICF may state ‘additional blood’ will be taken, when ‘extra blood’ is the recommended alternative. Although the use of scientific terms or measurements is sometimes unavoidable, they should be explained. We searched PICFs for technical/medical terms or symbols used without a lay explanation that were likely to be unfamiliar to a lay audience (e.g. assay, subcutaneously, pharmacokinetics, peripheral vasodilatation, < 0.4/> 1.0 u/ml) or where simpler alternatives are recommended (e.g. biopsy, inflammation) [32].
Descriptive summary statistics (mean [SD] and median [IQR]) were used as appropriate.
Readability scores were near normally distributed, so Student’s t-test (unpaired) was used for comparison. Page and word count were non-normally distributed, so the Mann-Whitney U test was used for comparisons. All statistical analyses were performed using Stata version 15 (StataCorp, College Station, TX, USA), and p values < 0.05 were considered statistically significant.