Quantitative ultrasound imaging of Achilles tendon integrity in symptomatic and asymptomatic individuals: reliability and minimal detectable change

Background Quantifying the integrity of the Achilles tendon (AT) is a rehabilitation challenge. Adopting quantitative ultrasound measurements (QUS measurements) of the AT could fill this gap by 1) evaluating the test-retest reliability and accuracy of QUS measurements of the AT; 2) determining the best protocol for collecting QUS measurements in clinical practice. Methods A total of 23 ATs with symptoms of Achilles tendinopathy and 63 asymptomatic ATs were evaluated. Eight images were recorded for each AT (2 visits × 2 evaluators × 2 images). Multiple sets of QUS measurements were taken: geometric (thickness, width, area), first-order statistics (computed from a grayscale histogram distribution: echogenicity, variance, skewness, kurtosis, entropy) and texture features (computed from co-occurrence matrices: contrast, energy, homogeneity). A generalizability study quantified the reliability and standard error of measurement (accuracy) of each QUS measurement, and a decision study identified the best measurement taking protocols. Results Geometric QUS measurements demonstrated excellent accuracy and reliability. QUS measurements computed from the grayscale histogram distribution revealed poor accuracy and reliability. QUS measurements derived from co-occurrence matrices showed variable accuracy and moderate to excellent reliability. In clinical practice, using an average of the results of three images collected by a single evaluator during a single visit is recommended. Conclusions The use of geometric QUS measurements enables quantification of AT integrity in clinical practice and research settings. More studies on QUS measurements derived from co-occurrence matrices are warranted.


Background
The Achilles tendon (AT) is the largest and strongest tendon in the human body. The great tensile loads, occurring predominantly during its elongation or contraction, make it vulnerable to overuse injuries. Although the prevalence and incidence of midsubstance Achilles tendinopathy (i.e., in the middle third of the tendon) are high in athletes, cases are also frequently reported in sedentary individuals [1][2][3][4][5][6]. The aetiology and pathogenesis of AT tendinopathy have been the subject of much research, but with inconsistent findings [2,3,7]. Hence, treating people suffering from this pathology remains challenging for rehabilitation professionals and the success rate of conservative treatments is variable [8][9][10].
Ultrasound imaging allows in vivo visualization of the biological integrity of the tendon. It is a safe, rapid, noninvasive, relatively inexpensive and popular method used in the assessment of AT tendinopathy [11,12]. When looking at ultrasound images (UIs) of healthy ATs, wellorganized and parallel alignment of the collagen fibres (i.e., fibrillary striation) are highlighted by alternating parallel bright bands (hyperechoic) of collagen and dark bands (hypoechoic) of extracellular matrix [12,13]. The paratenon of a healthy AT appears as an uninterrupted, well-defined bright line surrounding the tendon [12,13] (Fig. 1). Conversely, in people with midsubstance AT tendinopathy, the fibrillar striation pattern is often altered as a result of a disorganization of the collagen fibres and a thickened and hypoechoic portion of the AT reflects an increase in the quantity of extracellular matrix and tenocytes [8,14,15]. This will typically translate to focal thickening along the AT, presence of dark (hypoechoic) intratendinous regions and sometimes irregular contours of the tendon on UIs [13] (Fig. 1).
Interpretation of an UI of the AT is generally semiobjective. The general appearance of the image is annotated based on the different contrasts observed (e.g., heterogeneous, homogenous, focal or diffuse abnormalities) and the maximum thickness of the AT is often measured using a two-point digital caliper function on the US machine. This interpretation is largely influenced by the evaluator's experience with the recording technique and ability to interpret an UI [16,17]. Recent technological advances have helped to promote the development of new quantitative ultrasound (QUS) outcome measures extracted from an UI, specifically from a particular region of interest (ROI). Digital UIs can now be broken down into a multitude of micro pixels, and numerical values (e.g., average thickness, tendon width and area) can be measured. The echogenicity of a ROI within an image can also be quantified by allocating a numerical grayscale value to each of those micro pixels [18,19].
The usefulness of new UI analysis techniques has been demonstrated in various studies on animals and humans [20]. For example, these techniques have helped to quantify changes in the composition of an exercised muscle compared to an unexercised muscle in an elderly population [21][22][23]. These techniques have also revealed differences in the histological composition of the supraspinatus muscle and the quadriceps muscle in adults [24] and have been successfully used to detect structural changes in four key muscles in youths with neuromuscular disorders [25]. Moreover, new UI analysis techniques have enabled the differentiation of persons with Achilles tendinopathy from healthy individuals [26,27] and have been effective in detecting focal and diffuse abnormalities in the AT [28]. Very few studies have been conducted to evaluate the reliability of QUS measurements of the AT. This is worrisome considering that the reliability of the QUS measurement of AT thickness, a key diagnostic criterion for Achilles tendinopathy, is rarely reported. To our knowledge, studies that have investigated test-retest reliability of QUS measurements of the AT have shown a moderate to good level of reliability [29][30][31][32][33]. In addition, it was shown that ultrasound image recording is greatly influenced by the evaluator, even among highly experienced ultrasonographers (weak inter-evaluator reliability [34]). Various factors such as the pressure applied on the probe and its alignment can influence recorded image properties and thus alter the quantitative values extracted [35,36]. Information about the reliability and minimal detectable change is essential in order to develop evidencebased measurement taking protocols, empowering clinicians and researchers to quantify the tendinous changes observed in Achilles tendinopathy and incorporate these findings into clinical practice.
The primary objective of this study was to evaluate the reliability and minimal detectable change (MDC) of AT QUS measurements in people with symptoms consistent with midsubstance Achilles tendinopathy affecting at least one lower limb, as well as in completely asymptomatic individuals. The secondary objective was to recommend the best QUS measurement collection protocol possible, which could be subsequently used to characterize AT integrity in clinical practice or in research projects. It is anticipated that all QUS measurements, when collected by the same evaluator, will be reliable (Φ ≥0.75) and accurate (MDC NORMALIZED ≤ 15 %) and that a QUS measurement taking protocol in which a single evaluator averages the results of at least three images obtained during a single visit will be recommended in clinical practice.

Participants
A group of 20 individuals with clinical signs or symptoms of unilateral or bilateral midsubstance Achilles tendinopathy and a group of 23 asymptomatic individuals agreed to take part in this study. Individuals with symptoms consistent with Achilles tendinopathy had to have experienced pain over four weeks, evoked pain on palpation in the middle third of the AT and a VISA-A score below 100. The VISA-A questionnaire [37], completed by all the participants, is a reliable and validated measurement tool with an interest in AT pain, ability to function in daily life and during athletic activities. Eight questions are summed to produce an overall score, which is used as an indicator of the pathology's severity. Scores range from 0 to 100, with a low score indicating greater severity. Asymptomatic participants were to have no pain or previous history of pain in the AT, no observable sign of Achilles tendinopathy or pain in the ankle, and a VISA-A score equal to 100 [38]. The criteria for the inclusion and exclusion of all participants, as well as each group's specific characteristics, are summarized in Fig. 2. Finally, before conducting any formal testing, ultrasound visualization of the two ATs was performed for each participant to verify its integrity (i.e., normal tendon structure) at and around its insertion and also to rule out complete rupture of the AT. This experiment was approved by the Centre for Interdisciplinary Research in Rehabilitation of Greater Montreal (CRIR) Research Ethics Committee (Certificate: CRIR-557-1110). Participants were fully informed of the nature of the study and asked to sign a consent form before participating.

Clinical examination
Initially, all the participants underwent a clinical examination conducted by an experienced physiotherapist specialized in musculoskeletal disorders with over 10 years of experience. This examination aimed at detecting signs and symptoms typically present with midsubstance Achilles tendinopathy. Special attention was given to the visualization of the AT (without ultrasonography), with emphasis on finding the characteristic thickening sometimes present in its middle third, as well as evoked pain on palpation of the AT's middle third. A series of manoeuvres was carried out to apply passive and active tensions to selected structures with the intention of reproducing the participant's symptoms at the AT: manually resisted contraction of the sural triceps, passive stretch of the sural triceps muscle, repeated unilateral heel rise test and repeated unilateral jump.

Ultrasound image recording Device and settings
All of the ultrasound examinations were conducted using a Philips HD11 1.0.6 ultrasound machine with a 5-12 MHz 50 mm linear array transducer (Philips Medical Systems, Bothell, WA). Image field depth was set to 2 cm, gain was set to 85 dB, probe frequency to 12 MHz and a unique focal zone (set at a depth of 0,5 cm) was positioned at the level of the AT. These main machine settings, as well as all the other options (e.g., compress, map, smooth, X-resolution) were maintained across all examinations performed for all participants in order to standardize the recorded images across all participants.

Ultrasonographers
A physiotherapist (M-J Nadeau) and a resident in physiatry (A. Desrochers) conducted all ultrasound examinations and recorded all of the AT images using a precise protocol (see next section). Both had previously received 10 h of practical training in AT ultrasound examination from an experienced physiatrist recognized by his peers in musculoskeletal ultrasonography (M. Lamontagne).

Image recording protocol
A summary of the image recording protocol is illustrated in Fig. 3. Initial visit (test): Each participant was placed in a prone position, with both feet dangling over the end of the table, and ankles positioned at about 5°of plantar flexion using a splint to immobilize the foot. Once placed in this position, the AT's enthesis on the calcaneus was located by ultrasonography, and the skin marked at this location. The enthesis was defined as the most distal point of the insertion of the AT on the calcaneus. Another mark, made at a distance of 6 cm proximal to the enthesis, served as a standardized location for the center of the probe when performing the recording of all the ultrasound images. The images were recorded in this precise location since studies indicate that the incidence of Achilles tendinopathy is higher at this level (i.e., middle third) [2,7]. The first evaluator recorded two images of the AT in the longitudinal view, as well as two additional images in the transverse view. During the recording of each of these images, the probe was removed and then repositioned on the skin with the center of the probe continually aligned with the mark on the skin. Once these four images were recorded, the first evaluator erased all the marks drawn on the skin before the second evaluator repeated the same image collection protocol. Particular attention was paid to the probe's positioning on the tendon, taking care to apply minimal pressure on the probe and to align the transducer according to fiber orientation, with respect to the local referential (i.e., x, y, and z axis) defined by the tendon itself.  Hence, the transducer was not necessarily perfectly aligned with the traditional anatomical planes and may have deviated slightly from them (i.e., yaw, roll and pitch movements of the transducer).
Second visit (retest) After a minimal 10-min rest period, the evaluators repeated the image collection sequence described above. No significant change in QUS measurements was anticipated as each participant had to remain at rest between the two sessions.

Image analysis
To calculate the different QUS measurements and facilitate characterization of the integrity of the AT, ultrasound images initially recorded in DICOM format were converted to JPEG format. An interactive 2D viewing and image analysis software, developed by the research team using MATLAB Image Processing Toolbox (The Mathworks, Natick, MA), was used to extract the QUS measurements. The development of this program was inspired by work previously realized by a research team based at the University of Pittsburgh that used QUS measurements to characterize shoulder tendons (i.e., supraspinatus, biceps) [39][40][41]. Each image selected for analysis appeared on screen and the evaluators (i.e., the physiotherapist and a trained research associate) traced a standardized ROI directly on the image, using markers. For blinding purposes, all images recorded during visits 1 and 2 by a unique evaluator were uploaded as a block of images prior to starting the image analysis. Thereafter, each image was presented in a random order to the evaluator to conduct the image analysis. While conducting the image analysis, only the image appeared on the computer screen and all other information was blinded with a black frame generated by the program. As described in detail below, the anatomical landmarks defining the ROIs in the longitudinal and transverse images of the AT differed between both views ( Fig. 1a and b).
In the longitudinal view images, the ROI included a 1-cm length area, centered in the middle of the image and captured 6 cm proximal to the tendon enthesis (Fig. 1a). For transverse view images, the ROI was defined by the tendon's contour (Fig. 1b). The ROI outline in both images was established to include the AT's fibres and exclude the paratenon.
These two ROIs were used to extract the following QUS measurements selected for this study: thickness, width (only for transverse images), area (only for transverse images), echogenicity, variance, skewness, kurtosis, entropy, contrast, energy and homogeneity.
Thickness The average thickness of the tendon's ROI is calculated in the longitudinal view. One hundred equidistant points are plotted respectively on the upper and lower edges of the AT and the distance between each pair of points is calculated. The 100 distance measurements are then averaged and represent the thickness. In the transverse view, the thickness of the AT is determined by encompassing the tendon with a rectangle (Fig. 1b). The height of the rectangle reflects the tendon's maximum thickness (Fig. 1e).
Width The width is determined from the rectangle that encompasses the tendon, as defined above. The rectangle's width reflects the tendon's width (Fig. 1b).
Area The tendon's area corresponds to the area of the region delimited by the tendon's outline.
To calculate the other QUS measurements, the ROI is fragmented into multiple micro pixels (micro pixel = 0,0057 mm 2 ) by the software and a numerical grayscale colour value is allocated to each micro pixel. The grayscale is a scale of colors used in imagery that ranges from 0 = black to white = 255 for a total of 256 possible shades.
The micro pixels' grayscale values included in the ROI are initially represented by a grey level frequency distribution curve found in the ROI (grayscale histogram) ( Fig. 1c and f ). The following first-order statistics can be calculated from this distribution curve: echogenicity, variance, skewness, kurtosis and entropy. Additional information on QUS measurements are provided in Table 1.
Next, a co-occurrence matrix is calculated. Texture analysis using a co-occurrence matrix is based on the repeated occurrence of a typical pixel configuration in the image's ROI. It considers how many pairs of pixels with specific grayscale values and a specific predefined spatial relationship (distance and relative orientation angle) are present in an ROI. In this study, pairs of pixels were calculated in four directions (angles = 0°, 45°, 90°135°) and a distance of 10 pixels was determined. The following texture indicator measurements are derived from this matrix: contrast, energy and homogeneity ( Table 1).
All of the QUS measurements can be classified into three categories: geometric measurements (thickness, width, area), measurements computed from a grayscale histogram (echogenicity, variance, skewness, kurtosis, entropy) and measurements computed from a co-occurrence matrix (contrast, energy, homogeneity).
It is expected that a healthy tendon would have a more heterogeneous appearance because of the alternation of its black and white stripes, with a larger range of values on the grayscale. A pathological area in a tendon would, in contrast, have a darker, more homogeneous appearance, with grayscale values closer to zero (black). Hence, the following QUS measurement values are expected to be found in a pathological tendon: increased thickness, width, area, skewness, kurtosis, homogeneity and energy, as well as reduced echogenicity, variance, entropy and contrast [40].

Statistical analysis Outcome measures
The overall averages, standard deviations and confidence intervals of the results of the 8 images obtained for each QUS measurement in longitudinal and transverse views were calculated for all of the images for all tendons (n = 86) and separately for all of the images of symptomatic tendons (n = 23) and for all of the images of asymptomatic tendons (n = 63). The percentage difference between the averaged results of these two sub-groups was also calculated.

Reliability
The generalizability theory was used to determine the reliability of the different QUS measurements taken for the symptomatic and asymptomatic tendons. On the basis of the analysis of variance, this theory is mainly divided by 2 studies: the generalizability study (G-Study) and the dependability study (D-Study) [42]. Unlike the traditional theory of reliability that provides a unique random error term, the G-Study divides the error into different facets (sources of variance) relevant to our study and allows for the magnitude of the variance attributed to each facet to be determined. Therefore, in this study, the G-study determined the magnitude of the variance attributed to the subject (S), evaluator (E), visit (V), image (I), and random errors resulting from the interactions between these different sources of variance (SE, SV, SI, EV, EI, VI, SEV, SEI, SVI, EVI), thus leaving much less unexplained variance. In the G-Study, the variance component assigned to the subject (S) represents the difference between the subjects. This proportion of variance is error-free. The unexplained residual error is solely from the interaction between all sources of error and corresponds to the combination of the variances of subjects, evaluators, visits, and images (SEVI). Unlike the traditional theory of reliability that assumes that reliability exists independently of the measurement protocol design, the D-study relies upon information Mean of grayscale values of micro pixels encompassed within the ROI (from 0 (black) to 255 (white) inclusively). x Dispersion around the mean of the grayscale values of micro pixels encompassed within the ROI.
Skewness (S k ) Reflects the asymmetry of the grey level frequency distribution curve around its mean. A high coefficient (in absolute value) translates in a shifted distribution relative to the mean, while a zero coefficient indicates a symmetric distribution. In a positively skewed distribution, pixels intensities are biased toward lower values (shifted distribution to the left). In a negatively skewed distribution, pixels intensities are biased toward higher values (shifted distribution to the right).
Kurtosis (K t ) Reflects the flatness of the grey level frequency distribution curve around its mean. A diffuse distribution will translate in a lower kurtosis value. Distribution concentrated around its mean will translate in a higher kurtosis value.
Entropy (E) Reflects disorder in a ROI. It considers the number of grey levels in a ROI, and the proportions of each grey level. There is an increased entropy when multiple grey level values are present in the ROI. Vice-versa, entropy equals zero if an image has a single grey level value for all its micro pixels.

QUS Measurements computed from a co-occurence matrix
Texture parameters Contrast (I con ) Contrast measures the difference of intensity between the grey level values of neighboring micro pixels. There is a reduced contrast in a constant image with lesser local variations of the grey level intensities. On the contrary, contrast is higher in an image containing a large amount of local sudden variations in the values of grey level intensities.
Energy is linked to the regularity and consistency of the patterns in an image. High energy is measured in a constant and steady picture. Vice-versa, low energy is found in an image in which the contacts of grey level values are diverse, uncoordinated and random.
Homogeneity is increased in an image with a large number of pixels having the same grey level values, with little grayscale transition (i.e., increased when there is a large area of the same color).
I(x,y) denotes the grayscale intensity at the x,y coordonates in a ROI comprising M rows and N columns. p(i,j) represent the element of a grey-level co-occurrence matrix and denotes the probability that grayscale intensities i and j are adjacent generated from the G-study to determine the reliability of specific simulated protocol designs and provides information to optimize reliability according to, for example, the context in which the measurement is being used (e.g., clinical practice versus research). In this study, the impact of different experimental protocols on the reliability coefficients (Φ), standard error of measurement (SEM) and normalized minimal detectable change (MDC NORM) for each QUS measurement was determined. Since it is documented in studies that inter-rater reliability of QUS measurements is clearly inferior to intra-rater reliability, a single evaluator was used in the D-study. Improvements which may be obtained by averaging 1-3 images during a single visit or by averaging the images obtained during two visits by a single evaluator were compared. The G-and D-studies allow for the calculation of dependability coefficients (Φ), ranging from 0 (no reliability) to 1 (perfect reliability). In general, the dependability coefficient (Φ) can be interpreted as follows: poor reliability (Φ < 0.50), moderate (0.50 ≤ Φ < 0.75), good (0.75 ≤ Φ < 0.90) and excellent (Φ ≥ 0.90). However, there is some consensus that reliability indicators must exceed 0.90 for clinical measurements on an individual basis in order to minimize error and ensure reasonable validity [43]. More liberal reliability scores are allowed on a group basis, particularly for research purposes. The generalizability analysis was conducted using PC GENOVA statistical software, Version 2.2.

Standard error of measurement
Because the dependability coefficient (Φ) can be high despite substantial variability in the measurements, the standard error of measurement (SEM) has also been reported. The absolute SEM is estimated using the same units as the primary outcome measure. The SEM, which is the square root of the error variance, reflects the accuracy of a measurement.

Minimal detectable change
The absolute minimal detectable change (MDC ABS ) was calculated to determine the extent of the absolute change required to detect a difference that could be interpreted as a real difference exceeding the measurement error. For a 90 % confidence level (z = 1.65), which is considered sufficient for clinical decision-making, the MDC ABS was calculated using the following equation: In order for the MDC to be independent from the unit of measurement and to facilitate its interpretation, the MDC ABS has been subsequently normalized relative to the average obtained (MDC NORM ) and calculated using the following equation: MDC NORM ≤15 % reflects excellent measurement accuracy.

Results
The overall averages, standard deviations and confidence intervals of the results of the 8 images obtained for each QUS measurement in longitudinal and transverse views for the complete set of tendons (n = 86) as well as for the symptomatic tendons (n = 23) and asymptomatic tendons (n = 63) are summarized in Table 2.

Sources of variance
The magnitude of each variance component (source of error), expressed as a percentage of the total variance for each QUS measurement for symptomatic and asymptomatic tendons, is presented in Tables 3 and 4 for images recorded in longitudinal and transverse views, respectively. Aside from the main source of variance associated with the subject (S) in most cases, the evaluator (E) is the systematic error with the highest variance percentage, up to 13.7 % of the total variance. The other systematic errors (visit and image) are negligible and vary from 0 to 1.9 % of the total variance. A significant proportion of random error is attributable to sources of variance that involve an interaction between the subject and the evaluator (SE, SEV, SEI) with proportions of up to 32.9 %, 40.0 %, and 16.3 % of the total variance, respectively. The contribution of the other errors (SV, SI, EV, EI VI, SVI, EVI) is lower, with percentages ranging from 0 to 10.8 %, where 10.8 % represents SVI interaction. The unexplained residual error (SEVI) is variable (1.5 to 27.3 %) for all of the measurements, with the exception of kurtosis (21.1 to 43.9 %) and skewness (22.7 to 39.2 %), which remains slightly higher.

Reliability and minimal detectable change
The reliability and MDC of different hypothetical QUS measurement acquisition protocol designs are described in Tables 5 and 6 for the images in longitudinal and transverse views, respectively. Different trends in reliability and MDC of the QUS measurements are observed for the three main measurement categories.
The reliability and MDC of the results for the protocol design in which the QUS measurement results of three images taken by a single evaluator in a single visit (E = 1, V = 1, I = 3) are averaged were compared for the three main measurement categories. This measurement scenario is compatible with clinical practice.

Geometric measurements
In general, these QUS measurements have shown good to excellent reliability, with dependability coefficients ranging from 0.88 to 0.98 and good accuracy with a MDC 90% NORM <15 % obtained in most cases. Only the thickness of symptomatic ATs in longitudinal view had a MDC 90% NORM value greater than 15 % (MDC 90 % NORM = 23.66 %) which still remains acceptable.

Measurements computed from a grayscale histogram
Echogenicity stands out in this category by its excellent results with dependability coefficients ranging from 0.88 to 0.92 and a MDC90% NORM ranging from 8.56 to 15.51 %. The entropy also seems promising with a tendency for slightly better reliability than other QUS measurements in this category (all Φ =0.77 except for one Φ = 0.34), and an excellent MDC 90% NORM ranging from 2.18 to 4.95 %. However, the other QUS measurements in this category have shown only weak to moderate reliability, with most dependability coefficients below the threshold of 0.75 (Φ range = 0.49-0.79). These measurements also showed a large MDC90% NORM (with the exception of entropy) ranging from 26.09 to 76.02 %.

Measurements computed from a co-occurrence matrix
In general, these QUS measurements have shown moderate to excellent reliability with Φ ranging from Φ = 0.69 to Φ = 0.92 and a variable MDC90% NORM ranging from Impact of averaging results from multiple images or visits D-study reliability and error measurement estimates were computed for 6 experimental designs (Tables 5  and 6). Improved reliability and decreased MDC were obtained in all cases by increasing the number of recorded images. Slightly larger improvements in reliability and MDC were observed by recording images from two visits, in all cases.

Discussion
Evaluator as a significant source of variability The evaluator represents a significant source of variability in the recording of UIs. The high level of technical skills and manual dexterity during UI recording requires extensive clinical experience and may contribute to variability associated with the evaluator [44]. In the present study, the limited experience of both ultrasonographers (i.e., evaluators) might have increased the variability of the evaluator (E) facet. However, Gellhorn et al. revealed excellent inter-rater reliability for the cross sectional area (CSA) measurement of the patellar tendon (comparable to the AT in terms of shape, content and  V  I  SE  SV  SI  EV  EI  VI  SEV  SEI  SVI  EVI  superficial location) between a novice and an experienced sonographer [45]. The important interplay between the evaluator and the participants during an UI recording is highlighted by the high proportion of variance in all QUS measurements attributed to sources of error involving interaction between the subject and the evaluator (SE, SEV, SEI). Every AT is different and those anatomical and physiological differences (oblique orientation of the tendon, the subject's weak natural echogenicity, blurred outline of the tendon, etc.) are expressed by in the subject facet (S). These dissimilarities make capturing an image of certain tendons more challenging than others and may explain why an evaluator may have more difficulty in assessing some subjects than  V  I  SE  SV  SI  EV  EI  VI  SEV  SEI  SVI  EVI   others (SE). Consequently, it is recommended that a single evaluator record US images and extract QUS measurements, particularly when the goal is to monitor treatment effects over time.

Superiority of geometric measurements and echogenicity
The excellent results obtained in this study, in terms of reliability and accuracy, of geometric QUS measurements of area, thickness and width (Φ obtained mostly at the top of the clinical acceptable threshold of 0.90 and a MDC90% NORM <15 %) are similar or better than those obtained in comparable studies targeting the AT [31][32][33]46]. Although echogenicity is a measurement computed from a grayscale histogram, it behaves as a geometric measurement and has also shown excellent reliability and accuracy (mostly all Φ > 0.90 and MDC NORM < 15 %). Echogenicity has been previously studied, mainly on muscles, and has shown good reliability for repeated measurements (variation coefficients ranging from 5 to 11 %) [47][48][49]. Strict compliance of a standardized measurement taking protocol and the use of software to extract the geometric QUS measurements may explain, among other reasons, the favourable results obtained in this study. Continued use of the geometric QUS measurements, as well as echogenicity, is therefore encouraged in quantifying AT integrity. The poor results, in terms of reliability and MDC, of the QUS measurements computed from a grayscale histogram (variance, skewness, kurtosis) obtained in this study confirm the need for refinement and further study before advocating the use of these QUS measurements in the assessment of AT integrity. For a hypothetical protocol in which the evaluator averages the results of three images recorded during a single visit (E = 1, V = 1, I = 3), the dependability coefficients obtained for variance, skewness and kurtosis range from 0.49 to 0.79, with the majority of measurements falling under the threshold established to ensure good reliability of 0.75. The accuracy results are also disappointing (MDC90% NORM ranging from 26.09 to 76.02 %). Only entropy stands out for its general good reliability (all Φ =0.77 except for one Φ = 0.34) and high accuracy (MDC 90% NORM ranging from 2.18 to 4.95 %) in this study when using the above-described protocol.
The reliability of QUS measurements computed from a grayscale histogram was also studied previously by two research teams that obtained similar results to those of the present study. Collinger et al. assessed the reliability and accuracy of various QUS measurements extracted from longitudinal images of the long head of the biceps and the supraspinatus tendons based on the generalizability theory [39]. The reliability and accuracy scores, determined with a D-study for an E = 1, V = 1, I = 2 protocol, are similar to the present study. Good reliability and accuracy of thickness (Φ ranging from 0.92 to 0.94; MDC90% NORM ranging from 9.42 to 14.49 %) and echogenicity (Φ ranging from 0.79 to 0.85; MDC90% NORM ranging from 16.03 to 18.72 %), as well as low reliability and weak accuracy of the variance, skewness and kurtosis have been found (Φ ranging from 0.57 to 0.69; MDC90% greater than 15 % and up to 297.35 %). Entropy demonstrated moderate reliability and good accuracy (Φ ranging from 0.64 to 0.68; MDC90% ranging from 5.56 to 5.58 %). Slightly better results in terms of reliability of the QUS measurements were obtained in the present study compared to the study by Collinger. The superior reliability of the AT QUS measurements is possibly explained by the fact that this tendon is easier to assess than shoulder tendons due to its superficial position, alignment   [49]. This team also obtained results consistent with those in this study, that is, low reliability in these QUS measurements, with variation coefficients (analogous to SEM expressed as a percentage of the grand mean) ranging from 14 to 35 %. Knowledge of the theoretical foundations and the underlying calculation of the different QUS measurements computed from a grayscale histogram is essential in understanding the disappointing results, in terms of reliability and accuracy, of these measurements. The skewness and kurtosis QUS measurements are calculated directly from the shape of the grey level frequency distribution curve (grayscale histogram), while the shape of this curve also influences variance and entropy. For its part, the QUS measurement of echogenicity is an average of the grey scale values for all of the pixels in the ROI and does not take into account the shape of the distribution curve. The appearance of an anatomical structure on an ultrasound image can vary significantly depending upon the angle and the pressure applied to the tissues with the probe, both in terms of shape and echogenicity [50]. The edges of the ROI are sensitive to these variations in echogenicity, which can change the shape of the grey level frequency distribution curve without having a significant influence on the echogenicity's average value for the ROI.

Differences between transverse and longitudinal AT images
In this study, reliability and MDC of QUS measurements are generally similar between transverse and longitudinal images. The QUS measurement of thickness in a longitudinal view of symptomatic ATs is the exception to the rule as it tends to be less reliable (−8.58 %) and less accurate (−48.74 %) than in a transverse view. This difference might be explained by the fact that when the probe is positioned longitudinally to the AT fibres, it can be repositioned at different locations or angles on the tendon's sagittal plane for each image (Fig. 1e). In a longitudinal view, the thickness measurement is captured only for a slice located directly under the probe and it is difficult to ensure that it is located on the thickest portion of the tendon. Therefore, it appears preferable that AT QUS thickness measurements are also taken in the transverse view, at a location considered relevant and determined following a full excursion of the transducer along the AT in both planes. Other thickness measures previously reported in the literature (e.g., true thickness measure) may also deserve to be explored in the future, especially with regard to the thickness measured in the transverse view [32]. In addition, since our QUS thickness measurement in the longitudinal view reflects the average thickness of a targeted area, it is likely that its value is reduced in comparison to the maximum thickness found in this region captured in the transverse view of the AT.
Co-occurrence matrix shows promise in quantitative ultrasound imaging The co-occurrence matrix is an image analysis method that considers the spatial organization of the pixels, as opposed to the grayscale histogram that only considers the grey scale values of the pixels, without taking into account their position on the image or their interaction with the surrounding pixels [18,19,51]. In our study, better reliability was achieved for QUS measurements drawn from a co-occurrence matrix in comparison to the reliability of the QUS measurements computed from a grayscale histogram. Collinger et al. obtained similar results [39]. Superior reliability may be explained by the fact that the co-occurrence matrix studies pairs of pixels, and not the isolated value of each pixel's grey level.

Proposing a measurement collection protocol for clinical practice
In clinical practice, it is difficult to consider having more than one assessment visit in which additional images would be recorded (V = 2). Even though one or more additional visits positively influences the reliability and accuracy of QUS measurements, productivity constraints should be considered. A protocol in which the averaged result obtained from three images collected by a unique evaluator during a single visit seems to represent a good compromise. The clinical applicability of AT QUS measurements becomes highly relevant since the time required for recording them is, in the latter protocol, at most 10 min, which can help in the clinical decision making process and in practice.

Study limitations
This study has several limitations. Within the context of this reliability study, all measurements were taken in an identical location across all participants according to a standardized protocol. The location selected was set at 6 cm proximal to the enthesis of the AT considering that AT tendinopathy typically occurs between 2-6 cm proximal to its enthesis on the calcaneus [52,53]. Hence, measurements of the symptomatic tendons were not necessarily done exactly at the pathology's precise location for all participants which, in turn, may have minimized the variance between the pathological tendons and underestimated the reliability and accuracy of the measurements. Two separate tasks, which are both dependent upon the evaluator, must be performed when obtaining QUS measurements. The first task involves obtaining the ultrasound image (image acquiring) and the second consists of processing the acquired image in order to extract the desired quantitative values (image analyzing). Both "image acquiring" and "image analyzing" can affect reliability separately. For example, when the evaluator plots the delineation of the ROI while analyzing the UIs, the sometimes-blurred outline of the AT increases this task's difficulty. Syha et al. found that reliability of thickness measurements of the AT was more reliable when the ROI was traced automatically compared to manual tracing [54,55]. In the present study, it is impossible to differentiate the error related to the recording of the image from that of the analysis of the image. Further studies are required to isolate these potential sources of variability that are currently encompassed within the evaluator facet. Another source of variability and potential limitation of the study is that we are quantifying the integrity of a three dimensional tendon using two dimensional UIs.

Conclusions
This study focused on the reliability of three types of QUS measurements: geometric QUS measurements, QUS measurements computed from a grayscale histogram, and QUS measurements computed from a cooccurrence matrix. Even though additional validity and responsiveness studies are necessary, the favourable results of geometric QUS measurements and of the echogenicity further support their use in clinical practice and research protocols. These measurements could be used in longitudinal follow-up to capture the progress of the severity of AT tendinopathy and the impact of follow-up treatment in clinical practice or in rehabilitation research protocols. These measurements may also be useful in a transversal context in order to compare individuals between themselves or against standards established in clinical practice or in future studies. Furthermore, it is imperative that particular emphasis be given to adhering to a rigorous, standardized measurement-taking protocol when acquiring UIs to reach an acceptable level of reliability and accuracy. With respect to QUS measurements computed from grayscale histograms, in light of the results obtained in this study, the use of these QUS measurements in evaluating AT integrity should be reconsidered. Lastly, QUS measurements computed from a co-occurrence matrix are promising and additional studies on this emerging method of image analysis are necessary.