First year students demonstrated lower ability when observing photographs (k = 0.33). While most students observed >1 out of the 6 slides for correct observation, the majority of the student observers achieved 33-67% correct scores possible with 22% scoring 83.3% or above. The Case slide 4 proved more difficult amongst expert raters. This consisted of a lesion with a partial border under the second metatarsal head. Lack of visual depth perception could mislead the observer when considering the edge of any epidermal thickening. Partial or whole borders were intended to be interpreted as grade 2. Location would ultimately play a significant part as would the presence of an adjunctive deformity in any of the toes. Further work for post-debridement assessment is required to consider any impact on the classification model. One potential value of debridement is the ability of the skilled clinician to expose the deeper level of the epidermis to assay underlying pathology invoked by DEJ disturbance. The presence of underlying cysts and bursae however may not be exclusive to grade 1 or 2 keratin lesions .
Photography has been applied to a number of observation projects with musculoskeletal research using Cohen’s Kappa statistic for categorical data . While other studies have used interclass correlation coefficient (ICC) statistics for reliability, Cohen tried to account for some of the errors in measuring observation reliability with percentages . Reliability is related to lack of variation in a classification system when it is repeated [29, 30]. Intra-reliability observation was not studied in this project but it has been considered that inter-observer ratings reflect better reliability .
In one study covering wounds caused by burns, 11 observer raters presented with different skills experience. Reliability increased with experience . The observer reliability of podiatry students holds true as experience increases (R = 0.98), taken from the k values in this study.
Student’s previous academic experience was broken down into 7 categories, but lead to no correlation in regard to ability. While the study suggested greater reliability from qualified podiatrists spread over a greater geographical area, better control was sought within an educational setting. The experts provided contrast to students’ results and were more consistent for the small panel selected. The experts achieved a reasonable outcome (k = 0.88/83%). Based on kappa the value of the observational system with photographic evidence alone appears reliable within the context of fitting in with descriptors (Table 3). Without the use of additional tools such as the Foot Pain and Disability Index (MFPDI)  clinical validation would have to be assessed further.
Wound classification observer studies have been used by expert panels to assist observation of other raters. The weighted quadratic kappa (k) statistic assists with the differentiation between poor, moderate and good observation scores. Pairs of nurses using inter-observer classification rating k = 0.81 – 0.97 for ulcers, faired less well when working independently k = 0.49 . Podiatrists usually work alone but may have shared information in the classroom based exercise.
Comparable photographic reliability results were higher for experts at 0.83 in this study, and other studies using the same approach; 0.87  and 0.91 . Inexperienced observers in this study reached a mean 0.33 – 0.62. In contrast, nurses scored 0.33 , suggesting any value below 0.59 was less satisfactory for wound observation. Methodology from wound studies could not be directly compared to corns and callus [25, 26, 30] although values of k = 0.45 – 0.75 were ‘fair to good’ .
The hypothesis upon which four nominally graded options for corns and callus were based involved ‘staging’ to show the critical nature of lesions with and without hallux valgus deformity . While no evidence of staging for epidermal thickening exists in the literature, skin that blisters following shoe rub can alter with epidermal thickening. While some resistance has been offered to expand the grades further, errors could arise if the choice of selection becomes blurred. Where seven grades for shearing callus were used for pedal skin, classification became impractical when transferring definition from text to clinic . This was also found in paediatric dental study where 10 levels were used. Observer raters observing enamel damage in paediatric teeth with photography fared less well when relating to degrees of enamel trauma rather than colour variation . Use of extensive lists of classifications, where the descriptor has large numbers of different options can weaken the method’s effectiveness. Eight stages of classification used to describe fingertip injuries produced poor observational results .
It is acknowledged that while more options might allow for easier classification not all lesions would be possible to classify into four categories. It would be unlikely, given both pilot study results and controlled study results, that 100% reliability could be achieved. While errors would not have significant consequences if keratin classification was mistaken, the key contribution could add to diagnostic unpredictability unless combined with reliable tools to provide a quality-related tool.
No one lesion is the same, and DEJ pathology varies widely, as the dimensions of depth change according to sub-dermal damage . Inevitably this makes assigning lesion grading more difficult. In a study where photographic observation of wounds included pressure ulcers, a large proportion of photographs were not stageable, even by the experts. This was often because eschar covering the wound made it impossible to judge the extent of tissue involvement. Where extravasation arises within dense keratin overlying callus, skilled debridement ensures the DEJ has not been penetrated. It is at this point that new judgement and appropriate management is considered.
Clinical examination may reach a finite point where lesion differentiation cannot be made conclusively, whether by direct observation or from photographs without debridement. In this regard there is no contention that the use of a classification system will answer the clinician’s problems in isolation. Variations such as verrucae, fissures and pitted keratolysis must be excluded to avoid extending any unintentional inclusion with the model. However, from recent analysis of excised lesions , the exclusion of HPV infections will have to be reconsidered by all clinicians involved in skin management and may need to be included within the descriptor. Furthermore, once the DEJ is breached, thus forming first an erosion, then an ulcer, a different system of classification should be assigned as new pathology enters the equation.
It may be reasonable to avoid using any classification model where too many conditions become enveloped under one ‘umbrella’ system. Prognosis and outcome could be underpinned by classification provided that quantitative methods are added, e.g. visual analogue scale for pain and an assessment based on a validated health tool. Confounding errors arise more readily from photographs if descriptors used to judge lesions provide ambiguity. The difference between percentage of fibrin to cover the wound versus area of epithelisation demonstrated this aspect of observation [25, 26]. Boundary definition and callus density within the lesion appears to suffer similar errors.
Debridement as cyclical treatment has been considered an important component of ‘Core Podiatry’  but fails to make a compelling argument for continuance without change based on evidence where debridement demonstrates unsustainable improvement in pain unless repeated for the low risk categories [7,8,9,10,11]. Paradoxically avoidance of cyclical management will offer more attraction to commissioners of health care. Inevitably classification could help to prioritise patient management of callus but without validation from other analytical methods, predictable outcomes will remain challenging.