The reliability of the ankle brachial index: a systematic review

Background The ankle brachial index (ABI) is widely used in clinical practice as a non-invasive method to detect the presence and severity of peripheral arterial disease (PAD). Current guidelines suggest that it should be used to monitor potential progression of PAD in affected individuals. As such, it is important that the test is reliable when used for repeated measurements, by the same or different health practitioners. This systematic review aims to examine the literature to evaluate the inter- and intra-rater reliability of the ABI. Methods A systematic search of MEDLINE, EMBASE and CINAHL Complete was conducted to 20 January 2019. Two authors independently reviewed and selected relevant studies and extracted the data. Methodological quality was determined using the Quality Appraisal of Reliability (QAREL) Checklist. Results Fifteen studies of ABI reliability in a range of patient populations were identified as suitable for inclusion in the review: seven considered inter-rater reliability, four intra-rater reliability, and four studies evaluated both inter- and intra-rater reliability. Inter-rater reliability was found to be highly variable, with intraclass correlation coefficients (ICC’s) ranging from poor to excellent (ICC 0.42–1.00), while intra-rater also demonstrated considerable variation, with ICCs from 0.42–0.98. Meta-analysis was not possible due to the lack of statistical information reported. Conclusions Results of included studies suggest the inter- and intra-tester reliability of the ABI is acceptable. However, inconsistencies in obtaining systolic pressure measurements, calculating ABI values, and incomplete reporting of methodologies and statistical analysis make it difficult to determine the validity of the results of included studies. Further research, with more consistent reliability methodology, statistical analysis and reporting conducted in populations at risk of PAD is needed to conclusively determine the ABI reliability.

an objective measurement of peripheral blood flow [7,8]. The ABI represents the ratio of ankle to brachial systolic pressure and is recommended to be calculated by dividing the higher systolic pressure of the dorsalis pedis and tibialis posterior vessels at the ankle with the higher of the systolic pressures measured in the brachial artery in both arms [7,8].
The ABI is widely used to screen for PAD in different clinical settings and by different health professionals, from general medical practitioners to specialist vascular technicians [9,10]. Reliability of the test for accurate ongoing monitoring of lower limb vascular status has the potential to be affected by a number of factors. As an operatordependent test, this includes the experience and skills of the clinician, particularly as multiple clinicians are frequently involved in ongoing monitoring measurements [11,12]. There are also a number of types of equipment (e.g. automated versus manual) and methods used to measure ankle and arm blood pressures (e.g. stethoscope, Doppler, photoplethysmography probe), with variable findings as to whether the results are interchangeable [13][14][15][16]. The pre-test protocol and test environment have also been demonstrated to affect the resting ABI at measurement, with variations in body position [17], recency of tobacco smoking, caffeine intake [18,19] and exercise [20,21], and pre-measurement rest time [22] all likely to introduce error to the measurement and affect the test-retest reliability.

Objectives
Given that the ABI is the recommended method for screening for the presence and progression of PAD, it is important that it is reliable. Therefore, the aim of this review was to systematically evaluate the literature to determine the interand intra-rater reliability of the ABI in adults.

Search strategy
A search of relevant biomedical journal databases from the University of Newcastle library website was performed to identify studies that consider the reliability of ABI measurement from database inception to January 2019 using MED-LINE (1946+), EMBASE (1947+), and CINAHL Complete. Truncated versions of some search terms were used to ensure that relevant studies were included (Table 1).

Inclusion and exclusion criteria
The review was conducted with reference to the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) statement [23]. The following criteria had to be satisfied for inclusion in the review: published original research evaluating the reliability of the ABI in adults. Studies were excluded if the test-retest time frame made it likely that results may be affected by disease progression e.g. > 12 months. No language restrictions were applied to the database searches.

Other sources
Hand searching of the reference list of appropriate articles was also conducted.

Data collection and analysis
All abstracts obtained were assessed independently by SC and SL for inclusion. There were no instances of disagreement between reviewers, so arbitration by a third person (VC) was not necessary. Data extraction was performed by SC and SL. It was pre-determined that a meta-analysis of reliability outcomes for inter-and intra-rater reliability would be conducted provided there were sufficient studies that report the estimator of interest, and a measure of uncertainty for this estimator (e.g. standard error, 95% confidence interval, non-truncated p-value). Given the expectation for a high degree of study heterogeneity, we believed a fixed effect meta-analysis would generally not be appropriate so we aimed to only pool estimates using a random effects approach provided there were at least 5 studies [24].

Methodological quality assessment
The studies that met the inclusion criteria were appraised for risk of bias using the Quality Appraisal of Reliability (QAREL) Checklist and qualitative methodological assessment [25]. All full-text papers were assessed for methodological quality independently by two reviewers (SC/SL), and as there were no disagreements arbitration by a third reviewer (VC) was not necessary.

Characteristics and overview of included studies
The 15 studies in this review included a total of 916 participants, with data collected from a combination of one and both lower limbs (1396 limbs in total). Two studies did not state the number of limbs included [52,53]. Eleven studies assessed inter-rater reliability [12, 13, 16, 46-50, 56, 57], and eight studies reported intra-rater reliability [13,16,[52][53][54][55][56][57]. The characteristics of included studies are described in Table 2. Eleven studies reported participants' gender, with more men (n = 416, 56.4%) overall than women, whilst gender was unreported in four studies [12,46,49,50]. Most of the studies included predominantly older participants (age range (41-92 years) [12, 13, 16, 47-49, 51, 53-55, 57], however two studies recruited only younger adults (age range 22-30 years) [46,56], one study included  year olds [52] and one study did not report participants' ages [50]. The majority of studies [12, 47-51, 55, 57] included only participants with suspected PAD, or risk factors for atherosclerosis; three studied a mixed population including those without risk factors or clinical indicators of PAD    [13,16,52]; two studies included only participants with diabetes [53,54], and two studies included only healthy individuals [46,56]. There was little consistency in the training and qualifications of the raters used, with experience ranging from students [46,47] to experienced vascular technicians and/or vascular specialist doctors [12,13,48,54,57]. Six studies did not state the background of the personnel performing the test [49, 51-53, 55, 56]. The majority of the studies used Doppler and manual sphygmomanometer to measure systolic blood pressures; [12, 13, 16, 46-49, 51-53, 56] however three studies used an automated device to obtain some or all of the pressure readings [54,55,57] and one study did not report the method used [50]. The reported pre-measurement rest time varied from five minutes [55] to 15 min [48], with seven studies not reporting a period of rest before testing commenced [44,47,49,50,[52][53][54]. The time between repeat testing varied from five minutes [46,56] to 4 weeks [52]; six studies did not report time between repeated measures [12, 49-51, 54, 55]. Several different methods were used to calculate the ABI. The majority of studies [47-49, 51, 52, 56, 57] divided the highest ankle pressure by the higher brachial pressure measurement, two [13,16] used the highest ankle pressure and the mean brachial value, and one used the lowest ankle pressure and the highest brachial pressure [55]. One study used a fully automated device that calculated the ABI value [54], and four did not state how the ABI was calculated [12,46,50,53].

Methodological quality
The quality of studies was variable with regard to reported blinding of raters, order of examination and the time between repeated measurements, with no study clearly addressing all of these variables. While most studies used appropriate statistical measures of agreement, reporting of results was frequently incomplete and the true extent of reliability could not be determined (Table 3).

Meta-analysis
A number of the eligible papers identified lacked sufficient data relating to the main outcomes to allow for inclusion in a meta-analysis. For example, the paper by Chesbro et al., [46] provided no details on the intra-rater reliability of measurements taken using a Doppler, which was the main outcome being assessed in this review. Similarly, papers by De Graaff et al., [57] and Demir et al. [52] detailed no measure of variability for the intraclass correlation coefficients (ICC) reported, which is required when pooling results in a meta-analysis. It is not clear whether Chesbro et al. [46,56] used data from the same population in both studies, and the authors did not respond to a request for clarification. Finally, for the paper by Aboyans et al., [13] the type of ICC calculated was not reported, and while pooling of this data would be possible, understanding which ICC was used is preferred to allow for accurate and appropriate calculation of the standard error. As there were only a small number of eligible papers identified we would require data from all articles to allow for appropriate pooling of ICCs. Thus, as a consequence of the small number of papers reviewed and insufficient data reported by several of the papers it was not possible to conduct a meta-analysis as part of this review. None of the authors responded to requests for missing data. A narrative review of results is presented instead.

Inter-rater reliability
Inter-rater reliability results are included in Table 2. Statistical methods for calculating reliability were inconsistent. Of the eleven included studies, five reported levels of agreement with ICCs [13,16,46,56,57]. Of these, only three [13,46,56] reported 95% confidence intervals, which limits the interpretation of reliability in the context of clinically meaningful results. Based on ICC values alone, inter-rater reliability was highly variable, ranging from poor (ICC: 0.42) [16], to excellent (ICC: 1.0) [46].
Other estimates of reliability reported in included studies were coefficient of variation between raters [12,49] (ranging from 3.2 to 5.9%), inter-observer reliability of 10% for raters [48], and a moderate Pearson's correlation coefficient of 0.52 in a population with suspected PAD [50].
Of the remaining studies, one demonstrated statistically significant differences in ABI between raters in a population with severe PAD and in those with no disease, which did not occur in those participants with mild to moderate PAD [47], suggesting increased reliability with this disease state. In contrast, another paper reported Kappa coefficients of 0.4 (low agreement) for healthy limbs, 0.7 (good agreement) for limbs with PAD, and 0.43 (moderate agreement) for limbs with medial arterial calcification (MAC) (p < 0.001 for all values) [51].

Discussion
The findings of this review are that the inter-and intratester reliability of the ABI across a number of mixed 1. Was the test evaluated in a sample of subjects who were representative of those to whom the authors intended the results to be applied?
Was the test performed by raters who were representative of those to whom the authors intended the results to be applied?
Were raters blinded to the findings of other raters during the study?
. Was the test applied correctly and interpreted appropriately?
P Partly, NA Not applicable populations appears to be acceptable, however statistical tests of reliability in included papers were heterogeneous and levels of statistical reporting were inconsistent and incomplete. This makes interpretation of the reliability of the ABI in the context of clinical detection, evaluation and ongoing monitoring of peripheral arterial supply challenging, and prevented meta-analysis. For example, where studies lack 95% confidence intervals for ICCs, the validity of interpretation of the value is reduced as it fails to provide the lowest level of reliability that it represents. Similarly for coefficient or estimate of variation, values between 3.2 and 15.8% were reported.. Whilst this is considered an acceptable level of variation for many clinical tests, for the ABI it can represent a range of values that may indicate both normal and pathological results; which could reduce the ability of ABI to reliably determine the presence and extent of PAD. For example, assuming a variation of 15%, an ABI of 1.0 (which is considered 'borderline' when ABI is used as a screening tool [6]) could represent a true value between 0.85 (indicative of PAD) and 1.15 ('normal'). Further complicating the interpretation and generalisability of the inter-and intra-rater reliability results of included studies was the heterogeneity of participant populations. Whilst the majority of studies included older people with PAD risk factors or suspected PAD, three studies also included healthy participants [13,16,52], and two used an exclusively young and healthy population [46,56]. In clinical practice, ABI is used to evaluate peripheral arterial supply in people with risk factors for atherosclerosis, and in those with clinical signs and symptoms of PAD. The variation in the disease status of participants across the studies included in this review provides some difficulty in evaluating how the studies' findings apply to the people in whom the ABI would clinically be used. The study that reported nearperfect inter-and intra-tester reliability included only healthy individuals under the age of 30 [56]. This population would not typically undergo vascular screening, and the results obtained do not indicate the ability of the ABI to perform reliably in the presence of pathology where the result is likely to be lower and therefore change in result indicative of worsening pathology is likely to be small. In contrast, inter-tester and intratester reliability was found to be poor in several populations in which this test is recommended including people with diabetes and without MAC, [51] and older people with risk factors for PAD [16].
Methodological differences between studies is also likely to have contributed to variable reliability outcomes, with automated oscillometric devices demonstrating marginally better reliability than manual assessment using Doppler [49,55], while Doppler evaluation was found to be more reliable than the use of pulse palpation [13] or stethoscope [46]. Higher ABI reliability was found in more experienced raters [47]. Whilst most of the studies reported that participants rested for 5-15 min prior to testing [12,13,16,46,48,51,[55][56][57], six studies did not describe any pre-test preparation [47,49,50,[52][53][54], and only one paper took steps to ensure that participants did not consume alcohol, caffeine or tobacco (which are known to affect blood pressure) in the two hours prior to testing which may have affected measurements, particularly when taken across two different testing sessions [55]. This lack of reporting of the methodology used to obtain systolic blood pressure measurements makes it difficult to compare results across the included studies as it is unknown how much external factors are likely to contribute measurement variability.
Two papers identified the presence of diabetes mellitus as a factor that may affect reliability of the ABI [12,51], however only one study included a large enough sample of this cohort to perform statistical tests [51]. This study, which used only participants with diabetes, reported the Kappa coefficient for inter-tester measures for participants classed as having PAD or not, rather than performing ICCs on the measures obtained. The authors reported 'good' reproducibility of the ABI (Κ 0.7) in people classified by their ABI measurement as having PAD, but low reproducibility in those without PAD and in those with MAC. Previous research has also shown that people with diabetes demonstrate a different response to pre-measurement rest [22], and that brachial blood pressure measurement is also less reliable in these individuals [58]. Diabetes-related autonomic neuropathy has been shown to affect blood pressure regulation, with a lack of vasoconstriction arising from reduced sympathetic input, particularly in response to changes in temperature and position [59,60].

Limitations
While the search methods employed in this study were designed to be robust, there may be some evidence that was not captured, for example unpublished data. Further limitations to this study are the inability to perform metaanalysis in order to obtain a quantitative analysis of the available reliability data for the ABI, and the inability perform any sub-analyses relating to individual populations such as those with diabetes, or methods of measurement such as automated or manual methods. Furthermore, there has been some disagreement in the literature about which pressure measurement should be used to calculate the ABI [61,62], with no studies exploring the effect of calculation method on reliability. However, the method of calculation cannot be excluded as a factor affecting reliability that has not been considered by this review.

Conclusion
Results of included studies suggest the inter-and intratester reliability of the ABI is acceptable. However, inconsistencies in obtaining systolic pressure measurements, calculating ABI values, and incomplete reporting of methodologies and statistical analysis make it difficult to determine the validity of the results of included studies. Further research of ABI reliability using a more consistent approach to study design and implementation and more detailed reporting of results in populations with vascular pathology and at risk of PAD is required. Based on current available data clinicians should ensure they interpret ABI results in the context of other vascular assessment findings, and patient management is not based upon this measurement alone.