The agreement and repeatability of measurements of ankle joint dorsiflexion and poplietal angle in healthy adolescents
Journal of Foot and Ankle Research volume 15, Article number: 67 (2022)
The intra-rater repeatability and inter-rater agreement of orthopaedics measurements are important for estimating injury risk and appropriate treatment. In clinical practice, it is often unavoidable to trust the measurements of other health professionals.
This study tested the agreement and repeatability of measurements of the dorsiflexion of the foot, dorsiflexion with 90-degrees knee flexion, and popliteal angle test in healthy adolescents performed twice by three raters differing in clinical experience. Three raters, i.e., an orthopaedics specialist (16 years of experience), a resident medical doctor in orthopaedics (4 years of experience), and a physiotherapy student (1 year of experience) measured the ankle joint dorsiflexion and the popliteal angle in 142 healthy adolescent subjects.
The student outperformed more experienced raters by displaying good repeatability for all the evaluated parameters. The orthopaedics specialist failed to replicate the measurements of the left ankle joint passive dorsiflexion and the left popliteal angle. The medical resident in orthopaedics displayed a lack of repeatability in evaluating the right ankle joint dorsiflexion with the knee joint bent. Kendall’s W value for all parameters ranged 0.66–0.78, indicating a good inter-rater agreement.
The study highlights that measurements of the ankle joint dorsiflexion and popliteal angle test by different health professionals can generally be trusted. It indicates that novice health professionals could potentially evaluate such parameters in healthy subjects without a quality loss.
“Practice makes better”– this common and old saying is a basic rule taught to every resident starting their training. This is certainly true in the case of various fields, including medical biology and medicine. Appropriate training, increasingly often supported by simulation-based medical education, are essential to perform qualified, accurate and standardized measurements and procedures [1,2,3]. On the other hand, health professionals, including experienced clinicians, are frequently overworked, forced to switch tasks, or perform concurrent multitasking. These can have varying detrimental effects on task performance and increase the risk of error [4,5,6,7]. Moreover, experienced health professionals may more frequently [8,9,10] be subject to some biases, among which the most common include anchoring bias (the tendency to rely on the pre-existing assumptions when making clinical decisions), availability bias (the tendency to weigh the likelihood of things by how easily they are recalled), and confirmation bias (the tendency to give greater weight to data that support a preliminary diagnosis while failing to seek or dismissing contradictory evidence).
Having some procedures, including basic screening and diagnostic tests, performed by more novice health professionals, or under some circumstances, even medical students may decrease the work overload for experienced healthcare workers [11,12,13]. This, however, requires first to ensure that such examinations can be performed at the appropriate level of accuracy and repeatability.
In orthopaedics, accurate and repeatable measurements for ankle joint dorsiflexion and popliteal angle can be used to estimate injury risk and plan appropriate treatment in case of discrepancies [14, 15]. The shortening of the posterior thigh muscles, which is one of the elements influencing the size of the popliteal angle, increases the risk of knee injuries, and especially in adolescents, can lead to back pain, as well as asymmetry in the structure of the back [16,17,18]. The ankle joint dorsiflexion and knee range of motion may change during growth . Additionally, the muscular fascicle length and the tendon stiffens, which impact the ankle and knee range, may change during growth, and according to some authors, may also be influenced by stretching [17,18,19]. Nevertheless, their measurement would be valuable to determine if ankle joint dorsiflexion and popliteal angle test changes have occurred within an individual over time and due to exercise [17, 18]. However, the accuracy and repeatability of the range of motion can vary depending on the method and potentially on the clinician’s experience . Previous studies have suggested that being a more novice health professionals may not always be an obstacle to performing some medical procedures . It is important to emphasize that the ankle joint dorsiflexion and popliteal angle tests are considered as the most valuable methods in goniometric measurements [22, 23].
The present study aimed to test the agreement and repeatability of measurements of the ankle joint dorsiflexion and popliteal angle test in adolescents performed twice by three raters differing in clinical experience: an orthopaedics specialist (male, 16 years of experience in goniometry and patients examination, 41 years old), a resident medical doctor in orthopaedics (male, 4 years of experience in goniometry and patients examination, 29 years old), and a physiotherapy student (male, 1 year of experience in goniometry and patients examination, 24 years old). We hypothesized that the more experienced the rater, the better repeatability of the measurement.
The study group consisted of 142 (57 female, 85 male) adolescents attending the junior high school in Poznań, Poland (age 13–15, mean ± SD 13.8 ± 1.0). The inclusion criteria were as follows: no orthopaedics and/or neurological condition, practicing sports only at school, attending standard curriculum (SC) or extended physical activity curriculum (EPAC). Only healthy participants were included in the study because many orthopaedics conditions, and particularly neurological and neuro- orthopaedics disorders, are accompanied by spasticity, a phenomenon that reduces the range of joint motion and deforms the lower limb [24,25,26].
Overall, 60 and 82 subjects attending EPAC and SC were recruited, respectively. At the time of the study, there was a total of 5117 junior high school attendees in Poznań (although a share of healthy subjects was not possible to estimate). This considered the representativeness of the sample size was calculated with Cochran’s formula . A power calculation indicated that for the considered sample size (n = 142) a margin error was 8.1% at the confidence level of 95%.
The EPAC subjects had 14–18 physical education classes per week, while SC had four classes, 45 min each, starting with a few-minute warm-up including running, squats, and static stretching. The sports practiced during the classes included football, basketball and field hockey. The study protocol was approved by the Bioethical Committee of the Poznan University of Medical Sciences (Approval No. 212/17). All parents and school heads gave their written consent for the study. The subject provided verbal consent before the examinations. Three subjects did not agree to participate in the study, despite the written consent of their parents – all were excluded from the examination. All were advised on the purpose and course of the investigations and were given free will to withdraw from the examination at any time.
The ankle joint dorsiflexion and ankle joint dorsiflexion with a knee in 90 degrees of flexion and the popliteal angle test were measured in all subjects twice within 2 h interval. A need for such a short interval in studies of repeatability was acknowledged in previous research [28, 29]. Each rater performed twice testing up to no more than 15 students a day to avoid fatigue factor. Examination of 142 individuals was performed during 11 visits to schools. Three individuals were asked to enter the examination room at each examination. Afterward, they were informed to return to classes and return to the examination room after 2 h. Each time, the measurements were taken by three rater who underwent 2 weeks of training of patient examinations, conducted in the orthopaedics ward.
The results recorded by one rater were not available to others during the study. Similarly, the results recorded during the first test were unavailable for a rater before a second measurement series. Before the examinations, the raters were double pre-checked in terms of quality and skill of the testing by the independent specialists in orthopaedics who did not participate in the study. According to the provided opinion, all three raters were performing the examination correctly.
All examined subjects did not have physical activity classes during the day of testing. All examinations were performed with students lying on a mattress. The maximum passive range of ankle joint dorsiflexion was checked in a supine position with lower extremities extended. Ankle joint dorsiflexion was evaluated with the hip and knee joints flexed to 90 degrees. During the examination, it was ensured that the dorsiflexion was in the ankle to eliminate the action of the middle and forefoot. The raters took special care to perform the dorsiflexion in the neutral position of the ankle and foot, without any inversion or eversion. The last test was a popliteal angle test (maximum extension of the knee joint with the hip flexed to 90 degrees).
The ankle joint dorsiflexion was measured using landmarks: the proximal (the fibular shaft and over the lateral malleolus) and the distal (the shaft of the fifth metatarsal). The axis of the goniometer was distal to, but in line with, lateral malleolus at the intersection of lines through the lateral midline of the fibula and the lateral midline of the fifth metatarsal. The same landmarks and goniometer axis were applied for the measurement of ankle joint dorsiflexion with 90 degrees-knee flexion. The assistant stabilized (pre-instructed school nurse) the knee in the position found by the rater before. The assistant also was holding the electronic inclinometer (baseline digital inclinometer) to observe the 90 degrees of flexion of the hip. However, it was a rater observing. To measure the popliteal angle: The hip flexion was flexed 90 degrees, additionally confirmed with an electronic inclinometer. The goniometer was held alongside the thigh, pointing to the great trochanter, with the second landmark alongside shin to the lateral malleolus. The goniometer axis was the lateral condyle of the femur. Additionally, the assistant stabilized the knee in the position found by the rater, and was holding the inclinometer. These measurement techniques are considered the most reliable [30,31,32]. Their scheme is presented in Fig. 1.
During all examinations, the hip and knee were stabilized by the pre-instructed nurse. The subjects did not wear socks and wore loose shorts, which did not restrict movement. Both lower limbs were examined. The range of motion was measured with a universal goniometer (Merck, Darmstadt, Germany) because it is a widely used and accepted instrument in orthopedics, while previous studies have shown that the use of this method in the clinical evaluation of ankle joint dorsiflexion, knee examination, especially active knee extension test, and popliteal angle is reliable [22, 23, 33]. The range of the popliteal angle and ankle joint dorsiflexion was expressed in degrees.
The statistical analysis was performed using Statistica 12 (StatSoft Inc., Tulsa, OK, USA) and PQStat (PQStat Software, Poznań, Poland) and p < 0.05 was considered as statistically significant. The assumption of the Gaussian distribution was evaluated with the Shapiro-Wilk test. To assess the reliability of rater scores, the stability of scores and the agreement were analyzed. To evaluate the stability, the test-retest reliability method was used: the same group was tested twice using the same measurement tool, and stability was shown by high repeatability. To that end, we analyzed each rater’s scores for changes between examination I and II using the Wilcoxon test because the analyzed variables were not normally distributed. The Spearman correlation coefficients (Rs) were also calculated. Kendall’s coefficient (W) was calculated for the first and the second examination to determine the agreement between the scores from the three raters (a specialist in orthopedics, a resident medical doctor in orthopedics and a physiotherapy student), Kendall’s W lower than 0.4 was considered as insufficient agreement, Kendall’s W in the range (0.40; 0.60) was rated as satisfactory agreement; (0.60; 0.80) as good agreement, and Kendall’s W above 0.80 was considered to be a very good agreement . The analysis was performed for each evaluated aspect.
The summary of measurement results obtained by each rater is provided in Table 1. The results of the repeatability of scores obtained by three raters are summarized in Table 2. In the case of orthopedic specialists, the lack of repeatability of scores was found for passive ankle joint dorsiflexion of the left foot and the left popliteal angle. For other parameters, the repeatability was retained. The medical resident in orthopedics displayed a lack of repeatability for evaluation of the right ankle joint dorsiflexion with the knee joint bent. Repeatability was demonstrated for all the other aspects. The physiotherapy student showed the best performance with the repeatability found for all the evaluated parameters.
The Kendall’s W values ranged from 0.63 to 0.78 depending on parameter, indicating good intrarater agreement (Table 3).
The present study provides insight into the repeatability and agreement of the measurements of the ankle joint dorsiflexion and popliteal angle test provided by three raters who differed in clinical experience. As revealed, the measurements undertaken by an experienced specialist were not repeatable for two out of all six parameters of examination, both for the left side. It may potentially be a chance finding. All the raters were right-handed, while the examination was performed in a similar fashion on both sides. A resident in orthopaedics displayed the lack of repeatability of one of parameter (evaluation of the right ankle joint dorsiflexion),, while the physical student outperformed the rest of the raters not only in repeatability but also reliability of measurements as demonstrated by the highest values of Spearman’s correlation coefficient. Previous studies are evidencing that clinical experience is not, in selected situations, related to better performance [20, 35]. For example, Borstad and Briggs have shown no difference between novice and experienced clinicians in a latissimus dorsi length measurement . Morgan and Cleave-Hogg indicated that clinical experience had no predictive value in performance assessments when using standardized anesthesia simulation scenarios . The observation of the present study may have different explanations. It may arise from the assumption of orthopaedics specialists that slight differences in angle measurements will have little significance for a further clinical course, in particular for future treatment. In turn, a student’s may perform best due to his potential belief in the need for a thorough examination to yield accurate and clinically relevant results. By no means the present paper intends to challenge the significance of clinical experience in the accuracy and repeatability of medical measurements. Numerous works are showing that more advanced techniques require training, particular skills and knowledge [36,37,38]. Although since the evaluation of parameters considered in our study is not highly challenging, it is worth highlighting that novice individuals could assess them without a quality loss.
The second objective of the present study was to evaluate the level of agreement between measurements performed by different rater during both examination series. This is important in clinical research and practice as it is frequently needed to trust in results provided by other health professionals. Although some previous studies on the inter-rater agreement in orthopaedics measurements indicated a very good or even excellent level as high as 95%, it should be noted that it is likely a result of a small number of tested samples/individuals, e.g., 15 radiographs , 7 cadaver specimens  or 20–25 patients [41, 42]. At the same time, it was highlighted that such analyses are only valid if the number of subjects is at least 50 . The reliability of measurements of goniometry of knee and foot range of motion reported in some of the previous studies is higher compared to that in our research [22, 23, 33, 44]. However, all these studies investigated reliability on small sample size. Research involving a greater number of subjects had similar reliability to that obtained in our study .
A negative aspect of the test-retest reliability method is the interval between the test and re-test. When the interval is too short, the rater may remember the scores and give similar scores on the re-test, which will increase the value of the correlation coefficient. If the interval between the measurements is too long, the correlation coefficient may be lower. In our study, correlation coefficients for all the raters and most aspects suggested a significant relationship (RS = 0.6–0.8). This justifies the claim that the interval between the first and second examination was adequate. Importantly, the second Kendall’s W value was slightly lower for three aspects and slightly higher for three aspects in comparison to the first examination. This suggests that despite the 2 h passed between the examinations, the raters scored patients independently of one another, without any consultations after the first examination.
An increase in the number of subjects can result in a decrease in agreement level. For example, in one study encompassing 60 healthy subjects, the inter-rater agreement in measurements of small angles of dorsiflexion was classified only as fair (. The present study examined a total of 142 patients, three-fold the threshold recommended for measurements of reliability . Despite the differences in repeatability of selected parameters related to the rater’s medical experience, shown in the previous sub-section, the agreement results indicate that measurements of the ankle joint dorsiflexion and popliteal angle test performed by different physicians can generally be trusted.
Study limitations must be considered. The research included only healthy individuals. This is because various orthopedic disorders are accompanied by the spasticity phenomenon, which can dynamically change within a short period of time and could even be influenced by the sole examination. Therefore, including subjects with pathologies could bias the findings, their interpretation and conclusions. However, it cannot be entirely ruled out that examination of healthy subjects also influenced the results, e.g., through an assumption of orthopedic specialist that slight differences in angle measurements will have little significance for a further clinical course. It remains unknown whether novice health professionals could evaluate similar parameters in disabled subjects without quality loss - this would require an additional, specifically designed study.
The present study showed that more novice physicians could potentially perform selected orthopaedics examinations of healthy subjects without a quality loss. Further studies employing a larger number of compared raters and disabled patients are required to confirm this conclusion. As demonstrated, the precision of the evaluation had a significant impact on the score, while the effect of the rater’s professional experience was smaller. A least experienced rater, a student of physical therapy, revealed the highest repeatability of measured parameters.
Availability of data and materials
The datasets used and analysed during the current study are available from the corresponding author on reasonable request.
Wilson ML. Education and training: practice makes perfect. Am J Clin Pathol. 2002;118:167–9.
Walsh BM, Wong AH, Ray JM, Frallicciardi A, Nowicki T, Medzon R, et al. Practice makes perfect: simulation in emergency medicine risk management. Emerg Med Clin North Am. 2020;38:363–82.
Stanislawczyk L. Human or mouse? Practice makes perfect. Lab Anim. (NY). 2019;48:319 Springer Science and Business Media LLC.
Weigl M, Müller A, Sevdalis N, Angerer P. Relationships of multitasking, physicians’ strain, and performance. J Patient Saf. 2013;9:18–23 Ovid Technologies (Wolters Kluwer Health).
Kalisch BJ, Aebersold M. Interruptions and multitasking in nursing care. Jt Comm J Qual Patient Saf. 2010;36:126–32.
Hall LH, Johnson J, Watt I, Tsipa A, O’Connor DB. Healthcare staff wellbeing, burnout, and patient safety: A systematic review. PLoS One. 2016;11:e0159015 Public Library of Science (PLoS).
Douglas HE, Raban MZ, Walter SR, Westbrook JI. Improving our understanding of multi-tasking in healthcare: drawing together the cognitive psychology and healthcare literature. Appl Ergon. 2017;59:45–55.
Mamede S, van Gog T, van den Berge K, Rikers RMJP, van Saase JLCM, van Guldener C, et al. Effect of availability bias and reflective reasoning on diagnostic accuracy among internal medicine residents. JAMA. 2010;304:1198–203.
Elston DM. Confirmation bias in medical decision-making. J Am Acad Dermatol. 2020;82:572.
Saposnik G, Redelmeier D, Ruff CC, Tobler PN. Cognitive biases associated with medical decisions: a systematic review. BMC Med Inform Decis Mak. 2016;16:138.
Bazan D, Nowicki M, Rzymski P. Medical students as the volunteer workforce during the COVID-19 pandemic: polish experience. Int J Disaster Risk Reduct. 2021;55:102109.
Davis DP, Campbell CJ, Poste JC, Ma G. The association between operator confidence and accuracy of ultrasonography performed by novice emergency physicians. J Emerg Med. 2005;29:259–64.
Gottlieb M, Bailitz JM, Christian E, Russell FM, Ehrman RR, Khishfe B, et al. Accuracy of a novel ultrasound technique for confirmation of endotracheal intubation by expert and novice emergency physicians. West J Emerg Med. 2014;15:834–9.
Dickson D, Hollman-Gage K, Ojofeitimi S, Bronner S. Comparison of functional ankle motion measures in modern dancers. J Dance Med Sci. 2012;16:116–25.
Shitara H, Tajika T, Kuboi T, Ichinose T, Sasaki T, Hamano N, et al. Ankle dorsiflexion deficit in the back leg is a risk factor for shoulder and elbow injuries in young baseball players. Sci Rep. 2021;11:5500 Springer Science and Business Media LLC.
Napiontek M, Czubak J. Hamstring shortening: postural defect or congenital contracture. J Pediatr Orthop B. 1998;7:71–6.
Al Attar WSA, Soomro N, Sinclair PJ, Pappas E, Sanders RH. Effect of injury prevention programs that include the Nordic hamstring exercise on hamstring injury rates in soccer players: a systematic review and meta-analysis. Sports Med. 2017;47:907–16 Springer Nature.
Monajati A, Larumbe-Zabala E, Goss-Sampson M, Naclerio F. The effectiveness of injury prevention programs to modify risk factors for non-contact anterior cruciate ligament and hamstring injuries in uninjured team sports athletes: A systematic review. PLoS One. 2016;11:e0155272 Public Library of Science (PLoS).
Radnor JM, Oliver JL, Waugh CM, Myer GD, Moore IS, Lloyd RS. The influence of growth and maturation on stretch-shortening cycle function in youth. Sports Med. 2018;48:57–71.
Borstad JD, Briggs MS. Reproducibility of a measurement for latissimus dorsi muscle length. Physiother Theory Pract. 2010;26:195–203.
Blasier RB. The problem of the aging surgeon: when surgeon age becomes a surgical risk factor. Clin Orthop Relat Res. 2009;467:402–11.
Martin RL, McPoil TG. Reliability of ankle goniometric measurements: a literature review. J Am Podiatr Med Assoc. 2005;95:564–72.
Shamsi M, Mirzaei M, Khabiri SS. Universal goniometer and electro-goniometer intra-examiner reliability in measuring the knee range of motion during active knee extension test in patients with chronic low back pain with short hamstring muscle. BMC Sports Sci Med Rehabil. 2019;11:4 Springer Science and Business Media LLC.
Pidgeon TS, Ramirez JM, Schiller JR. Orthopaedic management of spasticity. R I Med J (2013). 2015;98:26–31.
Woo R. Spasticity: orthopedic perspective. J Child Neurol. 2001;16:47–53.
Balci BP. Spasticity measurement. Noro Psikiyatr Ars. 2018;55:S49–53.
Cochran WG. Sampling Techniques. 3rd Edition. New York: John Wiley; 1977. https://www.academia.edu/29684662/Cochran_1977_Sampling_Techniques_Third_Edition.
Hamela-Olkowska A, Dangel J. Estimation of the atrioventricular time interval by pulse Doppler in the normal fetal heart. Ginekol Pol. 2009;80:584–9.
Bartlett JW, Frost C. Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables. Ultrasound Obstet Gynecol. 2008;31:466–75.
Venturini C, Ituassú NT, Teixeira LM, Deus CVO. Confiabilidade intra e interexaminadores de dois métodos de medida da amplitude ativa de dorsiflexão do tornozelo em indivíduos saudáveis. Braz J Phys Ther. 2006;10:407–11 FapUNIFESP (SciELO).
Jonson SR, Gross MT. Intraexaminer reliability, interexaminer reliability, and mean values for nine lower extremity skeletal measures in healthy naval midshipmen. J Orthop Sports Phys Ther. 1997;25:253–63.
Themes UFO. Measurement of range of motion of the ankle and foot. Musculoskeletal Key 2016 [cited 2022 Jul 21]. Available from: https://musculoskeletalkey.com/measurement-of-range-of-motion-of-the-ankle-and-foot/
Alawna MA, Unver BH, Yuksel EO. The reliability of a smartphone goniometer application compared with a traditional goniometer for measuring ankle joint range of motion. J Am Podiatr Med Assoc. 2019;109:22–9.
Field AP. Kendall’s coefficient of concordance [internet]. Encyclopedia of statistics in behavioral science. Chichester: Wiley; 2005. Available from:. https://doi.org/10.1002/0470013192.bsa327.
Morgan PJ, Cleave-Hogg D. Comparison between medical students’ experience, confidence and competence. Med Educ. 2002;36:534–9.
Farivar BS, Flannagan M, Leitman IM. General surgery residents’ perception of robot-assisted procedures during surgical training. J Surg Educ. 2015;72:235–42.
Marvin K, Bowman P, Keller MW, Ambrosio AA. Effectiveness of an advanced airway training “boot camp” for family medicine physician trainees. Otolaryngol Head Neck Surg. 2020;163:204–8.
McSparron JI, Michaud GC, Gordan PL, Channick CL, Wahidi MM, Yarmus LB, et al. Simulation for skills-based education in pulmonary and critical care medicine. Ann Am Thorac Soc. 2015;12:579–86.
Ali Z, Karim H, Wali N, Naraghi R. The inter- and intra-rater reliability of the Maestro and Barroco metatarsal length measurement techniques. J Foot Ankle Res. 2018;11:47 Springer Science and Business Media LLC.
Carter TI, Pansy B, Wolff AL, Hillstrom HJ, Backus SI, Lenhoff M, et al. Accuracy and reliability of three different techniques for manual goniometry for wrist motion: a cadaveric study. J Hand Surg Am. 2009;34:1422–8.
Barker KL, Lamb SE, Burns M, Simpson AH. Repeatability of goniometer measurements of the knee in patients wearing an Ilizarov external fixator: a clinic-based study. Clin Rehabil. 1999;13:156–63.
Parel I, Cutti AG, Kraszewski A, Verni G, Hillstrom H, Kontaxis A. Intra-protocol repeatability and inter-protocol agreement for the analysis of scapulo-humeral coordination. In: Med biol Eng Comput, vol. 52: Springer Science and Business Media LLC; 2014. p. 271–82.
Hopkins WG. Measures of reliability in sports medicine and science. Sports Med. 2000;30:1–15.
Hancock GE, Hepworth T, Wembridge K. Accuracy and reliability of knee goniometry methods. J Exp Orthop. 2018;5:46.
Brosseau L, Tousignant M, Budd J, Chartier N, Duciaume L, Plamondon S, et al. Intratester and intertester reliability and criterion validity of the parallelogram and universal goniometers for active knee flexion in healthy subjects. Physiother Res Int. 1997;2:150–66.
None, Not applicable.
Ethics approval and consent to participate
The study protocol was approved by the Bioethical Committee of the Poznan University of Medical Sciences (Approval No. 212/17). All parents and schools’ heads gave their written consent for the study. The subjects gave the oral one, just before the examinations.
Consent for publication
None to declare.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Pietrzak, K., Miechowicz, I., Nowocień, K. et al. The agreement and repeatability of measurements of ankle joint dorsiflexion and poplietal angle in healthy adolescents. J Foot Ankle Res 15, 67 (2022). https://doi.org/10.1186/s13047-022-00572-1