- Open Access
- Open Peer Review
Clinical photographic observation of plantar corns and callus associated with a nominal scale classification and inter- observer reliability study in a student population
Journal of Foot and Ankle Research volume 10, Article number: 45 (2017)
The management of plantar corns and callus has a low cost-benefit with reduced prioritisation in healthcare. The distinction between types of keratin lesions that forms corns and callus has attracted limited interest. Observation is imperative to improving diagnostic predictions and a number of studies point to some confusion as to how best to achieve this. The use of photographic observation has been proposed to improve our understanding of intractable keratin lesions.
Students from a podiatry school reviewed photographs where plantar keratin lesions were divided into four nominal groups; light callus (Grade 1), heavy defined callus (Grade 2), concentric keratin plugs (Grade 3) and callus with deeper density changes under the forefoot (Grade 4). A group of ‘experts’ assigned from qualified podiatrists validated the observer rated responses by the students.
Cohen’s weighted statistic (k) was used to measure inter-observer reliability. First year students (unskilled) performed less well when viewing photographs (k = 0.33) compared to third year students (semi-skilled, k = 0.62). The experts performed better than students (k = 0.88) providing consistency with wound care models in other studies.
Improved clinical annotation of clinical features, supported by classification of keratin- based lesions, combined with patient outcome tools, could improve the scientific rationale to prioritise patient care. Problems associated with photographic assessment involves trying to differentiate similar lesions without the benefit of direct palpation. Direct observation of callus with and without debridement requires further investigation alongside the model proposed in this paper.
While photography offers a common method for assessing wounds , no published evaluation has been applied to plantar forefoot corns and callus. Pain associated with increasing epidermal skin thickness and concentrated areas of keratin have been associated with corns, callus and infection of the skin by human papilloma virus (HPV). During the mid-twentieth century keratoma, often described as intractable plantar keratoma (IPK), was popularised by foot surgeons in North America where an unofficial six-stage classification included viral warts (human papilloma virus or HPV) .
Confusion associated with HPV has afforded debate from clinical observation alone. When 43 cases were reviewed after circular excision, recurrence showed 51.1% of excised corns were associated with HPV . Many professionals believe they can determine the difference between corns and verrucae and yet it is clear that clinical presentation is not always sufficient to secure an accurate diagnosis without biopsy.
It is acknowledged that there are neurological and vascular anomalies within callus , and human papilloma virus has provided one source of dermo-epidermal junction (DEJ) disturbance. The contribution of callus at deeper tissue level has been associated with the rupture of synovial sacs below the DEJ . The public have complained that hard skin and corns return after treatment . Sufficient evidence exists to highlight the shortfall in managing callus by debridement [7,8,9], although the use of orthoses has provided greater longevity using time related visual pain scale measurement [8, 10, 11]. Callus debridement analysis has predominantly been carried out on diabetic and rheumatoid groups rather than healthier populations where callus and corn management is part of core podiatry . Thickened epidermal tissue as ‘callus’ and ‘painful’ has been described without a specific location and can lack adequate descriptive narrative [13, 14]. Annotation (of skin changes) within clinical records should include colour, border variation, symmetry within lesions, and localisation of corns/callus based on standard dermatological texts. Patterns seen in elderly patients were best represented to include lesions outside the metatarsal head (MTH) perimeter  but this was far from the case in similar papers.
A graded classification system came about as part of a study involving 1700 patients. The classification model allocated whole numbers without sub-divisions, with the scale graded 1–4 for plantar callus/corn presentation after hallux valgus surgery; , Table 1.
Fewer grade 3 and 4 lesions were found compared to grade 1 and 2 . Although children were included, such lesions identified were more likely due to HPV infection. The original data capture isolated those under 10 and those collectively under 20 years (Table 2). An assumption was made that grade 4 lesions were worse than grade 1,2 and 3. It was reasoned that callus could be divided into four clear entities as distinct from viral warts, but clinical histological evidence has suggested HPV infection arising at the basal layer cannot be excluded where the constituent epidermal layers and dermal papillae are altered . Further review of histology and plantar keratin is outside the remit of this paper.
Podiatric classification systems have been cited without reliability studies [18,19,20] and extended their grading to include grades 5 and 6, where the latter related to epidermal breakdown. Since the original model for grading callus only a single paper has applied the method to clinical research and observation . In regard to the descriptor used for grades 1 and grade 2 callus, the author felt lack of clarity regarding density changes between different callus. This paper appears as the first academic review of the graded system but considered other forms of physiological related pathology . However, it is unclear from photographic plates provided in two papers [21, 22] that by seeking further clarification in respect to thickness, grade 2 lesions may have been confused with grade 4 lesions because the original descriptor source used was too brief . Classification should be precise and reproducible. When cataloguing any keratin lesions pathogenic changes should be mentioned within the narrative. Reliability has more to do with an assessment method free from measurement error . The clinician requires cost-effective and reliable systems that do not detract from clinical output.
A 32-year review of the original approach to classification of corns and callus has been considered for further evaluation in a controlled study. While this study has not been critically reviewed, one paper did consider the effect of hallux valgus on changes in the skin with callus under the forefoot. Forty-nine percent of patients the original study of 1700 patients group showed callus under the second MTH  while in a similar study for hallux valgus published 14 years later 34% presented with callus for the same location in a group of 104 patients . While no classification was provided in this Korean Orthopaedic paper the interest shown in a similar study was helpful. Due to perceived limitations hypothesised with brief descriptors a reworked descriptor was introduced into the method.
Pilot study & expert panel selection
Two pilot photographic studies were carried out 2013–2014 at two national conferences by consensual agreement with the organisers and participants. The observer raters were all qualified podiatrists. The first pilot study included an introduction and descriptors while the second study relied on descriptors alone. The second pilot study invited original observer raters with scores 80% + for the same photographs to review a different set of photographs. Six observer raters scoring 83%, (5/6 photographs) were accepted as ‘experts’. Five podiatrists (skilled) together with one biophysics engineer were recruited into the study (n = 6).
All students were resident at first and third year level at a Podiatry School within a University Department of Health Sciences and selected by an appointed tutor. Students were recruited along the same lines as for skilled observer raters without previous knowledge of the model grading method [25, 26]. PowerPoint™ was used to present 6 slides for student observer raters (Fig. 1) in a classroom and all anonymised sheets were returned to a podiatry tutor. First year students (n = 31) were inexperienced (first semester) and termed unskilled. Third year students (n = 24) had some clinical experience and but had just completed their second year and were considered semi-skilled. The skilled observers were used to validate the photographic lesions independent of the researcher (Table 2).
Photos used in the PowerPointTM slides were taken using a Canon Powershot SX50HS with macro settings and standard lighting control without flash photography set at the highest definition. Appropriate patient consent was taken. Poor quality slides were removed following two pilot studies. All plates contained no facial recognition and anonymised to observer raters.
Reliability was expressed as a value of weighted quadratic kappa statistic for observer ratings on a nominal, or ordinal scale graded 1–4 . A contingency table calculated the frequency of agreement and disagreement for each lesion. The strength of agreement for k = 0.81-1.0 implied an almost perfect state, k = 0.41-0.60 moderate, k = 0.21-0.40 fair and k = 0.10-0.20 slight . Values of the quadratic weighted statistic obtained alongside percentage responses are reported (Table 4).
First year students demonstrated lower ability when observing photographs (k = 0.33). While most students observed >1 out of the 6 slides for correct observation, the majority of the student observers achieved 33-67% correct scores possible with 22% scoring 83.3% or above. The Case slide 4 proved more difficult amongst expert raters. This consisted of a lesion with a partial border under the second metatarsal head. Lack of visual depth perception could mislead the observer when considering the edge of any epidermal thickening. Partial or whole borders were intended to be interpreted as grade 2. Location would ultimately play a significant part as would the presence of an adjunctive deformity in any of the toes. Further work for post-debridement assessment is required to consider any impact on the classification model. One potential value of debridement is the ability of the skilled clinician to expose the deeper level of the epidermis to assay underlying pathology invoked by DEJ disturbance. The presence of underlying cysts and bursae however may not be exclusive to grade 1 or 2 keratin lesions .
Photography has been applied to a number of observation projects with musculoskeletal research using Cohen’s Kappa statistic for categorical data . While other studies have used interclass correlation coefficient (ICC) statistics for reliability, Cohen tried to account for some of the errors in measuring observation reliability with percentages . Reliability is related to lack of variation in a classification system when it is repeated [29, 30]. Intra-reliability observation was not studied in this project but it has been considered that inter-observer ratings reflect better reliability .
In one study covering wounds caused by burns, 11 observer raters presented with different skills experience. Reliability increased with experience . The observer reliability of podiatry students holds true as experience increases (R = 0.98), taken from the k values in this study.
Student’s previous academic experience was broken down into 7 categories, but lead to no correlation in regard to ability. While the study suggested greater reliability from qualified podiatrists spread over a greater geographical area, better control was sought within an educational setting. The experts provided contrast to students’ results and were more consistent for the small panel selected. The experts achieved a reasonable outcome (k = 0.88/83%). Based on kappa the value of the observational system with photographic evidence alone appears reliable within the context of fitting in with descriptors (Table 3). Without the use of additional tools such as the Foot Pain and Disability Index (MFPDI)  clinical validation would have to be assessed further.
Wound classification observer studies have been used by expert panels to assist observation of other raters. The weighted quadratic kappa (k) statistic assists with the differentiation between poor, moderate and good observation scores. Pairs of nurses using inter-observer classification rating k = 0.81 – 0.97 for ulcers, faired less well when working independently k = 0.49 . Podiatrists usually work alone but may have shared information in the classroom based exercise.
Comparable photographic reliability results were higher for experts at 0.83 in this study, and other studies using the same approach; 0.87  and 0.91 . Inexperienced observers in this study reached a mean 0.33 – 0.62. In contrast, nurses scored 0.33 , suggesting any value below 0.59 was less satisfactory for wound observation. Methodology from wound studies could not be directly compared to corns and callus [25, 26, 30] although values of k = 0.45 – 0.75 were ‘fair to good’ .
The hypothesis upon which four nominally graded options for corns and callus were based involved ‘staging’ to show the critical nature of lesions with and without hallux valgus deformity . While no evidence of staging for epidermal thickening exists in the literature, skin that blisters following shoe rub can alter with epidermal thickening. While some resistance has been offered to expand the grades further, errors could arise if the choice of selection becomes blurred. Where seven grades for shearing callus were used for pedal skin, classification became impractical when transferring definition from text to clinic . This was also found in paediatric dental study where 10 levels were used. Observer raters observing enamel damage in paediatric teeth with photography fared less well when relating to degrees of enamel trauma rather than colour variation . Use of extensive lists of classifications, where the descriptor has large numbers of different options can weaken the method’s effectiveness. Eight stages of classification used to describe fingertip injuries produced poor observational results .
It is acknowledged that while more options might allow for easier classification not all lesions would be possible to classify into four categories. It would be unlikely, given both pilot study results and controlled study results, that 100% reliability could be achieved. While errors would not have significant consequences if keratin classification was mistaken, the key contribution could add to diagnostic unpredictability unless combined with reliable tools to provide a quality-related tool.
No one lesion is the same, and DEJ pathology varies widely, as the dimensions of depth change according to sub-dermal damage . Inevitably this makes assigning lesion grading more difficult. In a study where photographic observation of wounds included pressure ulcers, a large proportion of photographs were not stageable, even by the experts. This was often because eschar covering the wound made it impossible to judge the extent of tissue involvement. Where extravasation arises within dense keratin overlying callus, skilled debridement ensures the DEJ has not been penetrated. It is at this point that new judgement and appropriate management is considered.
Clinical examination may reach a finite point where lesion differentiation cannot be made conclusively, whether by direct observation or from photographs without debridement. In this regard there is no contention that the use of a classification system will answer the clinician’s problems in isolation. Variations such as verrucae, fissures and pitted keratolysis must be excluded to avoid extending any unintentional inclusion with the model. However, from recent analysis of excised lesions , the exclusion of HPV infections will have to be reconsidered by all clinicians involved in skin management and may need to be included within the descriptor. Furthermore, once the DEJ is breached, thus forming first an erosion, then an ulcer, a different system of classification should be assigned as new pathology enters the equation.
It may be reasonable to avoid using any classification model where too many conditions become enveloped under one ‘umbrella’ system. Prognosis and outcome could be underpinned by classification provided that quantitative methods are added, e.g. visual analogue scale for pain and an assessment based on a validated health tool. Confounding errors arise more readily from photographs if descriptors used to judge lesions provide ambiguity. The difference between percentage of fibrin to cover the wound versus area of epithelisation demonstrated this aspect of observation [25, 26]. Boundary definition and callus density within the lesion appears to suffer similar errors.
Debridement as cyclical treatment has been considered an important component of ‘Core Podiatry’  but fails to make a compelling argument for continuance without change based on evidence where debridement demonstrates unsustainable improvement in pain unless repeated for the low risk categories [7,8,9,10,11]. Paradoxically avoidance of cyclical management will offer more attraction to commissioners of health care. Inevitably classification could help to prioritise patient management of callus but without validation from other analytical methods, predictable outcomes will remain challenging.
Considerations for classification have been revisited after a 30+ year period to highlight weaknesses within existing clinical healthcare models for corns and callus, especially within the NHS. Used alone, classification remains limited but may provide a method to show improvement or deterioration. When considered with good quality dermatological description and assessment quality of life, the clinician could use triage by patient questionnaire and photographic media to improve consultations. Problems associated with photographic assessment involves trying to differentiate two similar lesions using a flat or 2-D representation without the benefit of direct palpation.
Classification does not differentiate other pathology such as foreign bodies, fibrous changes within the DEJ, inclusional cysts, bursae, effects of disrupted metatarsophalangeal joints, HPV and neoplasia. A descriptor should cover all possibilities, but dermatological lesions unrelated to surface pressure or DEJ damage can obfuscate the clinician’s selection.
Reliability with observation within health must be considered important when the impact of the model used is sensitive enough to make a difference. The skill when annotating the four-point grade model depends on minimising ambiguity around border definition and recognising density changes within callus. Grades 1–4 while independent of each other could define treatment objectives by combining other tools validating impact scores and establishing underlying causes.
Kappa values for observational reliability >0.8 might provide an acceptable value for benchmarking photography, but prior tuition is important. Direct clinical observation might improve the chances of observer reliability over photographic plates.
Human papilloma virus
Interclass correlation coefficient
Intractable plantar keratoma
Bianco M, Williams C. Using Photography in Wound Assessment. Pract Nurs. 2002;13:505–8.
Mann RA, DuVries HL. Keratotic disorders of the plantar skin. In: DuVries’ Surgery of the foot C V Mosby Company; 1978. p. 401–7.
Lopez FM, Kilmartin TE. Corn cutting in the 21st Century. Podiatry Now. 2016;10:25–7.
McCarthy DJ, Montgommery R. Heloma and Tylomata. In: McCarthy DJ, Montgommery R. Podiatric Dermatology. Williams & Wilkins; 1986. p. 54–9.
Whiting M. Affectations of the skin and subcutaneous tissues. in Neale’s Common Foot Disorders. Diagnosis & Management. In: Lorimer D, French G, West S, editors. . 5th ed. London: Churchill Livingstone; 1997. p. 132–6.
Hendry GJ, Gibson KA, Pile K, Taylor L, Du Toit V, Burns J, Rome K. “They just scraped off the callus”: a mixed methods exploration of foot care access and provision for people with rheumatoid arthritis in south-western Sydney, Australia. Journal of Foot & Ankle Related Research. 2013;6:34.
Bryan S, Parkin D, Donaldson C. Chiropody and the QALY: a case study in assigning categories of disability and distress to patients. Health Policy. 1991;18:169–85.
Siddle H, Redmond A, Waxman R, Dagg AR, Alcacer-Pitarch B, Wilkins RA, Helliwell PS. Debridement of painful forefoot plantar callosities in rheumatoid arthritis: The CARROT randomised controlled trial. Clin Rheum. 2012; https://doi.org/10.1007/s10067-012-2134-x.
Landorf KB, Morrow A, Spink MJ, Nash CL, Novak A, Potter J, Menz HB. Effectiveness of scalpel debridement for painful plantar calluses in older people: a randomized trial. Trials. 2013;14:243.
Colagiuri S, Marsden LL, Naidu V, Taylor L. The use of orthotic devices to correct plantar callus in people with diabetes. Diabet Res Clinical Pract. 1995;28:29–34.
Duffin AC, Kidd R, Chan A, Donaghue KC. High Plantar Pressure and Callus in Diabetic Adolescents. Incidence and Treatment. J Am Podiatr Med Assoc. 2003;93:214–20.
Farndon L, Vernon DW, Parry A. What is the evidence for the continuation of core podiatry services in the NHS: A review of foot surveys. Br J Podiatry. 2006;9:89–94.
Curran MJ, Ratcliffe C, Campbell JA. Comparison of Grades and thickness of adhesive felt padding in the reduction of peak plantar pressure of the foot: a case report. J Med Case Rep. 2015;9:203. doi.org/10.1186/s13256-015-0675-8
Miller M, Thompson SR. Miller’s Review of Orthopaedics. 7th ed: Elsevier; 2016. p. 321.
Spink MJ, Menz HB, Lord SR. Distribution and correlates of plantar hyperkeratotic lesions in older people. Journal of Foot and Ankle Research. 2009;2:8. https://doi.org/10.1186/1757-1146-2-8.
Tollafield DR, Price M. Hallux Metatarsophalangeal Joint Survey related to Postoperative Surgery Analysis. The Chiropodist. 1985;9:284–8.
Lee, K-B, Park, J-K, Park YH, Seo, H-Y, Kim, M-S. Prognosis of Painful Plantar Callosity After Hallux Valgus Correction Without Lesser Metatarsal. 2009. doi:https://doi.org/10.3113/FAI.2009.1048.
Sgarlato TE. Pathomechanics of various developmental abnormalities. In: Sgarlato TE. A Compendium of Podiatric Biomechanics. California College of Podiatric Medicine; 1971. p 377.
Merriman LM, Tollafield DR, Griffiths C. Plantar lesion patterns. The Chiropodist. 1987;42:145–8.
Campbell JA, Patterson A, Gregory D, Milns D, Turner W, White D, Luxton DEA, Cooke E. What happens when older patients are discharged from NHS Podiatry Services? Foot. 2002;12:32–42.
Hashmi F, Nester C, Wright C, Newton V, Lam S. Characterising the biophysical properties of normal and hyperkeratotic foot skin. Journal of Foot and Ankle Research. 2015;8:35.
Hashmi F, Nester C, Wright C, Newton V, Lam S. The reliability of non-invasive biophysical outcome measures of evaluating normal and hyperkeratotic foot skin. Journal of Foot and Ankle Research. 2015;8:28. https://doi.org/10.1186/s13047-015-0083-8.
Springett K, Merriman L. Assessment of the Skin and its Appendages. In: Merriman MM, Tollafield DR, editors. Assessment of the Lower Limb. London: Churchill Livingstone; 1995. p. 207.
Pinsolle V, Salmi LR, Evans DM, Michel P, Pelissier P. Reliability of the pulp nail bone (PNB) classification for fingertip injuries. The Journal of Hand Surgery. 2006;32E(2):188–92.
Hop MJ, Moues CM, Bogomolova K, Nieuwenhuis MK, Oen IM, Middlekoop E, Breederveld RS, van Baar ME. Photographic assessment of burn size and depth: reliability and validity. J Wound Care. 2014;23:144–52.
Bloemen MCT, Zuijlen PPM, Middlekoop E. Reliability of subjective wound assessment. Burns. 2001;37:566–71.
Sim J, Wright CC. The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Phys Ther. 2015;85:257–68.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.
Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968;70:213–20.
Beeckman D, Schoonhoven L, Fletcher J, Furtado K, Gunningberg L, Heyman H, Lindholm C, Paquay L, Verdu J, Defloor T. EPUAP classification system for pressure ulcers: European reliability study. J Adv Nurs. 2007;60:682–91.
Kirk J, Miller ML. Reliability and Validity in qualitative Research. Thousand Oaks, Sage Publications; 1986. p. 19–30.
Walsh P, Butterworth PA, Urquhart DM, Cicuttini FM, Landorf KB, Wluka AE, Shanahan EM, Menz HB. Increase in body weight over a two-year period is associated with an increase in midfoot pressure and foot pain. Journal of Foot and Ankle Research. 2017;10:31. doi:10.1186/s13047-017-0214-5.
Skaare AB, Maseng Aas AL, Wang NJ. Enamel Defects in permanent incisors after trauma to primary predecessors: inter-observer agreement based on photographs. Dent Traumatol. 2013. doi:10.1111/j.1600-9657.2012. 01153.x
The first and third year students at the University of Huddersfield, Human and Health Sciences (Podiatry). Podiatrists submitting to the expert panel, and to Dr. John Stephenson for assistance with reliability statistics tutor Dr. Andy Bridgen for reading the drafts.
Availability of data and materials
This paper formed part of a larger MSc study and all the conclusions drawn from the method and discussion represent only the element associated with photographic observation method.
Ethics approval and consent to participation
This study was approved by the Human and Health Sciences Postgraduate Course Ethics panel at the University of Huddersfield.
Consent for publication
Each person in the study was provided with informed consent and covered all the observers. No animal or human tissues were involved with this study.
The author declares that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.