A review of the foot function index and the foot function index – revised

Background The Foot Function Index (FFI) is a self-report, foot-specific instrument measuring pain and disability and has been widely used to measure foot health for over twenty years. A revised FFI (FFI-R) was developed in response to criticism of the FFI. The purpose of this review was to assess the uses of FFI and FFI-R as were reported in medical and surgical literature and address the suggestions found in the literature to improve the metrics of FFI-R. Methods A systematic literature search of PubMed/Medline and Embase databases from October 1991 through December 2010 comprised the main sources of literature. To enrich the bibliography, the search was extended to BioMedLib and Scopus search engines and manual search methods. Search terms included FFI, FFI scores, FFI-R. Requirements included abstracts/full length articles, English-language publications, and articles containing the term "foot complaints/problems." Articles selected were scrutinized; EBM abstracted data from literature and collected into tables designed for this review. EBM analyzed tables, KJC, JM, RMS reviewed and confirmed table contents. KJC and JM reanalyzed the original database of FFI-R to improve metrics. Results Seventy-eight articles qualified for this review, abstracts were compiled into 12 tables. FFI and FFI-R were used in studies of foot and ankle disorders in 4700 people worldwide. FFI Full scale or the Subscales and FFI-R were used as outcome measures in various studies; new instruments were developed based on FFI subscales. FFI Full scale was adapted/translated into other cultures. FFI and FFI-R psychometric properties are reported in this review. Reanalysis of FFI-R subscales' confirmed unidimensionality, and the FFI-R questionnaires' response categories were edited into four responses for ease of use. Conclusion This review was limited to articles published in English in the past twenty years. FFI is used extensively worldwide; this instrument pioneered a quantifiable measure of foot health, and thus has shifted the paradigm of outcome measure to subjective, patient-centered, valid, reliable and responsive hard data endpoints. Edited FFI-R into four response categories will enhance its user friendliness for measuring foot health.


Background
Foot problems commonly arise during our daily living activities [1,2]. The prevalence of foot problems in general ranges between 10% and 24% [3]. Their prevalence is higher among older individuals and in chronic rheumatoid arthritis (RA), gout, and diabetes mellitus with peripheral neuropathy [4]. Foot pain and disability can affect workers' productivity, work absenteeism, and other issues [5,6]. Because pain and disability are subjective complaints, they are difficult to quantify without a valid patient report of the degree to which an individual is experiencing foot pain. Without a valid measure, problems arise in documenting foot health status, tracking the progression of diseases, and establishing the efficacy of treatment, including assessment of treatment satisfaction and of health related quality of life from a personal perspective.
In 1991, the Foot Function Index (FFI) was developed as a self-reporting measure that assesses multiple dimensions of foot function on the basis of patient-centered values. The FFI consists of 23 items divided into 3 subscales that quantify the impact of foot pathology on pain, disability, and activity limitation in patients with RA [7]. The FFI was developed using the classical test theory (CTT) [8] method. It has been found to have good reliability and validity and has had wide appeal to clinicians and research scientists alike [3,9,10]. In the past 20 years, the FFI has been widely used by clinicians and investigators to measure pain and disability in various foot and ankle disorders and its use has expanded to involve children, adults, and older individuals. Furthermore, the FFI has been widely used in the study of various pathologies and treatments pertaining to foot and ankle problems such as congenital, acute and chronic diseases, injuries, and surgical corrections.
In 2006, the FFI was revised (the FFI-R) on the basis of criticisms from researchers and clinicians; items were added, including a scale to measure psychosocial activities and quality of life related to foot health [11].
A literature review was conducted to develop a theoretical model of foot functioning [12], based on the World Health Organization International Classification of Functioning (ICF) model. The FFI-R items were developed from the original 23 FFI items, and more items were added as a result of the literature review. As a result of clinicians and patients' input, the final draft of the FFI-R, which consisted of 4 subscales and 68 items, was completed. The results were the FFI-R long form (FFI-R L; 4 subscales and 68 items) and the FFI-R short form (FFI-R S; 34 items) as total foot function assessment instruments. Both the 68-item and 34-item measures demonstrated good psychometric properties.
The FFI-R in its current form is one of the most comprehensive instruments available. However, in a review article [13], questions were raised about the unidimensionality and independence of FFI-R subscales, and we did not include such reports in our previous article about the FFI-R [11]. We carefully reviewed the comments about the FFI-R and assessed the unidimensionality of the subscales by use of the Rasch model. On the basis of these critiques, the FFI-R required a periodic revision of its metrics to ensure it represented patient-centered health values and state-of-the-art methodology.
Our aim is to assess the contribution of the FFI and FFI-R to the measurement of foot health in the fields of rheumatology, podiatry, and orthopedic medicine. This assessment should enable us to reflect on and improve the quality of the measure. Therefore, we conducted a systematic review of literature pertaining to the FFI and FFI-R that has been published in the English language from October 1991 through December 2010. The objectives were to: (i), Assess the prevalence of uses of the FFI and FFI-R in clinical studies of foot and ankle disorders; (ii), Describe the utility and clinimetric properties of the FFI and FFI-R as they have been applied in various clinical and research settings; (iii), Enumerate the strengths and weaknesses of the FFI and FFI-R as reported in the literature; (iv), Address the suggestions found in the literature for improving the FFI-R metrics.

Methods for systematic search of the literature
This study was about a systematic review of articles in which the FFI and/or FFI-R were used as measures of a variety of foot and ankle problems. Relevant studies were identified by English language publication searches of the electronic bibliographic databases Pub Med/MEDLINE, EMBASE, BioMedLib and Scopus from October 1991 through December 2010.

Search terms and eligibility criteria
The key words: foot function index, FFI scores, foot function index scores, and foot function index revised (FFI-R). were used as search terms and was applied to all databases. FFI instruments/measure and/or FFI-R instruments/measure had to be mentioned in the abstracts and in the full articles to be collected for in-depth scrutiny. Articles fulfilling the inclusion criteria were selected for the review. The article criteria included: (i) the words foot function index/FFI or revised foot function index/FFI-R in its reports/measures; (ii) full-length articles; (iii) written in English and published from October 1991 through December 2010; (iv) the study population described needed to have foot complaint(s)/problems; and (v) regardless of the country conducting the study, the full-length article must have been published in English or in a foreign language with the abstract in English.

Objectives with method of data collection and organization of tables
Selected articles that fulfilled the criteria were independently reviewed and collected by the authors to address the objectives and organize collected data into several tables.

Objective 1. Uses of the FFI and FFI-R
We created four tables to address the first objective of describing the measurement's uses (Tables 1, 2, 3, and 4).

Objective 2. Utility and clinimetric properties
We designed a data-collection form to address the second objective. This form was assessed in a pilot study by collecting data from ten articles out of the collection of qualified articles; it was revised before being used in its current format. The variables used in this datacollection form were: (i) the instrument and year the article was published; (ii) the first author's name; (iii) the objectives of the study; (iv) the population characteristics, sample size, and diagnosis; (v) psychometric analysis (reliability and validity, etc.); (vi) items/domains/subscales of the FFI or FFI-R used in the study; (vii) response type; and, (viii) a short summary evaluation of each study. Therefore, this data form recorded the analytic statements extracted from each article, and 6 tables were created (Tables 5, 6 , 7, 8, 9, and 10). Data were arranged in each table in chronological order.
Objective 3. Enumerate the strengths and weaknesses of the FFI and FFI-R as reported in the literature This was a qualitative summary of the results as found in Table 5 and Table 6.
Objective 4. Improving the FFI-R metrics Table 11 summarizes results of the Rasch analysis. This was a reanalysis of the FFI-R database collected in 2002 with the aim of improving FFI-R metrics.

Descriptive analysis methods
Quantitative data were reported using simple statistics expressed as the sum, means, and standard deviations for continuous variables and as frequencies for categorical data. (Tables 1, 2, 3, and 4) Analytic statements and evaluations/comments for each article collected are summarized in Table 12. This depicts the summary of FFI and FFI-R uses as illustrated in Objective 2, and in six tables (Tables 5, 6 , 7, 8, 9 and 10).

Rasch analysis method
To address specific critiques of the FFI-R found in the literature, the unidimensionality of the FFI-R and its subscales were evaluated against the Rasch model. The statistical package Winsteps version 3.72.3 [14] was used to conduct a principal components analysis (PCA) of the standardized residuals to determine whether substantial subdimensions existed within the items [15][16][17] and whether the FFI-R L, the FFI-R S, and the 5 subscales were unidimensional. The criterion used to define unidimensionality was a large variance (> 40%) explained by the measurement dimension [18]. Unexplained variance in the first contrast of the data should be small and fall under the criterion of 15% for a rival factor. We chose a ratio of variance of at least 3 to 1 in the first principal component [19], compared to the variance of the first component of residuals.

Rasch reliability statistics
Reliability was estimated with Cronbach's Alpha and Rasch person reliability statistics. Both indices reflect the proportion of variance of the person scores or measures to total variance (i.e., including measurement error). Unlike Cronbach's Alpha, Rasch person reliability is based on the estimated locations of persons along the measurement continuum, excluding those with measures reflecting extreme (zero or perfect) scores and including cases with missing data. For both indices, our criterion for acceptability was .80.

Response category analysis
One requirement of the Rasch model is monotonicity: the requirement that, as person ability increases, the item step response function increases monotonically [20]. This means that choosing one categorical response over the prior-for example, moving from selecting "2 = A little of the time," to selecting, "3 = Most of the time,"-increases with person ability. The proper functioning of the rating scale is examined using fit statistics, where: (i) outfit mean squares should be less than 2.0, (ii) average measures advance monotonically with each category, and (iii) step calibrations increase monotonically [21,22].

Review of the literature
Articles were obtained by using the search method defined in the Methods section; the search results included 752 articles from PubMed/MEDLINE and 640 articles from Embase. Further screening and selection procedures, as detailed in Figure 1, yielded 182 full-text articles. Of these, 53 articles were qualified for review. Twenty-five more articles were obtained from the search engine BioMedLib and from manual searches. A total of 78 articles qualified for this review, summarized and categorized into several tables, Objective 1: Assessment of the prevalence of the FFI or FFI-R usage, population characteristics, and study locations Among the 78 studies, we identified 4714 study participants for whom the FFI or FFI-R instrument had been used to measure foot health. This sample consisted of 1914 (41%) male participants and 2688 (57%) female participants, with a mean age of 48.58 years (SD, 4.9 years). There was a discrepancy of 2% between the sums of male and female participants, because gender was not reported in three studies ( Table 1). Most of the participants were individuals and young adults, and a few studies involved juvenile participants. The types of studies included measurement practice studies (n=17), surgery studies (n=30), studies of orthotics (n=19) or other clinical interventions (n=4), and observational studies (n=8). We identified 20 different diagnoses of foot and ankle pathology that were measured by FFI and FFI-R (Table 2). Among them, RA and plantar fasciitis were the two most common diagnoses and were also noted to be the most painful and disabling foot conditions. These studies were conducted by investigators in 17 countries; the United States, the Netherlands, and the United Kingdom were the three most frequent users of the FFI and FFI-R in studies involving foot and ankle problems (Table 3).        Table 4 displays the versatility of the FFI with all 3 domains and FFI Subscales and FFI-R uses across the studies. This shows that clinicians and researchers were choosing the FFI scales depending on the nature of their studies. Among the various scales of the FFI, we found the FFI with all 3 domains (full scale), the FFI pain subscale only, and a combination of the pain and disability subscales to be the most frequently used, whereas the FFI-R was the least frequently used. The Dutch adaptation of the FFI, the FFI-5pts, was mostly used in the Netherlands as an outcome measure in studies of many surgical interventions.
In summary, the FFI with all 3 domains, or as subscales, was frequently chosen as a measurement instrument across various studies and countries and among various age groups and sexes, for the assessment of acute and chronic foot and ankle conditions.

Objective 2: Uses of the FFI and FFI-R in the field of foot health research
The uses of the FFI and FFI-R are provided in detail in Tables 5, 6, 7, 8, 9, and 10. Table 12 describes the study types, the name of the instruments, and the first author's        [7], the FFI-R [11]. The FFI Side to Side was derived from pain and disability subscales of the FFI [23]. The Ankle Osteoarthritis Scale (AOS) [24]; measured foot problems related to foot and ankle osteoarthritis. Agel et al. [25] modified the rating scale of the FFI pain and function subscales from the visual analog rating scale (VAS) to the Likert categorical scale; this modification was tested in a sample of individuals with nontraumatic foot complaints, and the metric of the Likert     [27] found that the Rand 36-Item Short Form Health Survey (SF-36) scores of a sample of individuals with foot and ankle disorders were moderately correlated with FFI scores and concluded that FFI scores can be used to monitor the quality of life of these patients. Shrader et al. [28] measured the stability of navicular joint alignment and found that this measure correlated well with the FFI scores of the sample. Helliwell et al. [29] developed a new measure, the Foot Impact Scale (FIS), to measure the impact of foot problems on foot health in a sample of individuals with RA; the metric of FIS was validated with the FFI and HAQ. In an RA study, van der Leeden et al. [30] reported that Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and Disease Activity Scores in 44 joints (DAS 44) were correlated with FFI scores; furthermore, this author discerns the correlations that the FFI pain subscale scores correlated with forefoot pain while the FFI function subscale scores correlated with hindfoot problems. The FFI scores were also used as validation measures of the American Orthopedic Foot and Ankle Society (AOFAS) clinical rating scales, an instrument that was widely used by foot and ankle surgeons [31]. These validation studies were reported by Baumhauer et al. [32] for the AOFAS hallux clinical rating scale and by Ibrahim et al. [33] for the AOFAS clinical rating scale, which was well to moderately correlated with FFI scores. The latter finding was based on his study with a 41% response rate in a sample consisting of 45 individuals.  Category C Cultural Adaptation or Translation. The first translation of the FFI was the Dutch-language instrument known as Dutch FFI-5pts [3]. The German-language translation of the instrument is the FFI-G [34]; the FFI was also translated into Brazilian Portuguese [35], Taiwan Chinese [36], Turkish [26], and Czech [37]. There was also a Spanish translation conducted by the MAPI Institute in Lyon, France [38]. These translations complied with rigorous language translation procedures; occasionally, some item adjustments of the scales were needed. In summary, the  FFI was developed with good reliability and validity; it also inspired and served as criterion validity for newer foot health measures and attracted the attention of researchers around the world, who conducted translations and adaptations of the tool into their native languages and cultures. Table 6 is a supplement to Table 5 and displays the clinimetrics of the instruments listed in Table 5; measures were metrically good, with reliability and validity values greater than 0.7 with one exception where the pain subscale had a reliability of 0.64 [3].

Surgical intervention
The FFI is one of the outcome measures most frequently used by AOFAS members [31]. It was first used to measure surgical outcomes. The surgical interventions and outcomes are summarized in Table 7. There are 30 articles, categorized generally according to type and location of surgical procedure. Five distinct procedural categories were identified as follows: (a) arthrodeses within the foot or ankle [39][40][41][42][43][44][45][46][47], (b) arthroplasty within the foot or ankle [48][49][50][51], (c) fracture care of the foot or ankle [52][53][54][55], (d) deformity reconstruction surgery of the foot or ankle [56][57][58][59][60], and (e) various surgical interventions for chronic conditions [61][62][63][64]. The FFI was also used to assess outcomes of less invasive procedures, such as calcaneal spur treatment by arthroscopy [37], distal tibia repair using fixation with cannulation osteosyntheses [65], arthroscopic chondrocyte implant of the tibia and fibula [66], and surgical interventions for complex ankle injuries [67]. In summary, the FFI and the Dutch FFI-5pts appeared to be useful in measuring outcomes of various surgical procedures in children, adults, and individuals with acute, chronic, and congenital foot and ankle problems. Table 8 lists studies using foot function outcome measures in orthotic interventions in the foot and ankle. The studies assessed the impact of orthotic treatment on forefoot, midfoot, and hindfoot/ankle pathology. Orthotic treatment on the forefoot in patients with RA improved the scores for pain, disability and activities [68,69], however the scores were unchanged in the study by Conrad et al. [70]. Other studies using special shoes and shoe inserts showed symptoms of relief in hallux valgus pain [71] hindfoot and forefoot problems [72,73]; and slowing the progression of hallux valgus in early RA [74]. Midfoot studies assessing the treatment of full length orthoses on pain relief [75], and mobility were performed using the FFI-R as an outcome measures [76]. For hindfoot conditions treatment with orthoses included studies of heel pain [77], plantar fasciitis [35,78,79], stabilizing hindfoot valgus [80], correction of posterior tibialis tendon dysfunction [81], destructive hemophilic arthropathy of the foot and ankle [82] and juvenile idiopathic arthritis of the foot and ankle [83]. Shoes/shoe inserts have also been found to relieve foot and ankle pain from arthritides [84,85]. In summary, the FFI and FFI-R clearly provided useful outcome measures for orthotic management of a wide range of foot and ankle disorders. Records not qualified (n=129)

Medical intervention
The FFI also was used to measure foot health outcomes associated with medical interventions (Table 9), such as cortisone injection of the ankle adhesive capsulitis [86]; the injection resulted in improved FFI pain and disability subscale scores. Di Giovanni et al. [87] measured the outcome of stretching exercises for plantar fasciitis versus Achilles tendonitis; both groups showed improvement in FFI pain subscale scores. Kulig et al. [88] used the FFI pain and disability subscales to measure the outcomes of exercise intervention in posterior tibial tendon dysfunction. Rompe et al. [89] reported the FFI pain score improved in the stretching treatment group of a randomized clinical trial using stretching and shockwave therapy to treat patients with plantar fasciopathy. Overall, the FFI was useful in measuring the outcomes of conservative interventions in chronic foot and ankle conditions.

Observational studies
Investigators had chosen the FFI scores or the subscale scores to determine the prevalence and disease burden of foot and ankle conditions in the general population (Table 10). Novak et al. [4] used FFI scores to evaluate type 2 diabetes with and without neuropathy and identified that group with neuropathy had worse FFI scores. Williams and Bowden [90] correlated high FFI scores to foot morbidity in rheumatic diseases, and estimated cost of care/staffing concerns for that patient subset. Williams [91] also used the FFI scores in patients with Paget's disease and noted the impacts on plantar foot pressures, gaits, and ambulation abilities. Kamanli et al. [92] correlated the scores of the FFI and foot bone mineral density, then extrapolated these scores to that individual's skeletal bone density. Kavlak and Demitras [93] reported a strong correlation of FFI scores with the scores of VAS pain scale, foot pain scale (FPS), and hindfoot function scale (HFS) in patients with foot problems. Goldstein et al. [94] noted that FFI scores of individuals with previous foot injuries had a high correlation with 6 other foot function instruments. Rosenbaum et al. [95] found that plantar sensory impairment of the foot in patients with RA was correlated with poor FFI scores. Schmiegel et al. [96] found that pedobarograph scores of patients with RA with foot pain were correlated with poor FFI and HAQ scores. In summary, FFI scores were useful in detecting the prevalence of foot and ankle problems and as a measure of concurrent validity for other foot health measures in various chronic foot conditions. In all, we found the FFI instrument was frequently chosen as an outcome measure of surgical, orthotic, and medical treatments, but its application was wider than we originally imagined. It was not limited to outcome measures; FFI scores were also applied in the promotion of foot health as a common public health issue and in increasing the awareness of health system administrators. The FFI was also used in the validation of newly developed foot health measures.
Objective 3: The strengths and weaknesses of the FFI and FFI-R as reported in the literature FFI: The FFI questionnaire had good psychometric properties [97][98][99][100], and the pain subscale was sensitive to change during instrument development [13]. In a study about treatment of plantar fasciitis in individuals with chronic foot pain, SooHoo et al. [64] reported that the pain subscale of the FFI had high standard response mean (SRM) and high effect size (ES) as outcome measures of surgery in chronic foot and ankle problems. While Landorf and Radford measured the clinical ability to detect a change as minimal important difference (MID) in plantar fasciitis [101]. All these clinical measures add to the credibility of the FFI as a self-reporting measure, the FFI reflects patients' assessment of their symptoms/health status, which directs providers about proper care planning and progress toward treatment goals. FFI is one of the most cited measures of its kind [102].
There are weaknesses of the FFI. During the development of the index, clinicians generated the questionnaire items without patient participation [13,97]; therefore, items might not fully reflect patients' needs, might be sex biased [7], and might not be applicable to high-functioning individuals. A theoretical model was not part of the design, nor were the items related to footwear [13,103], which are essential to support the construct of this instrument. It is also lacking items for measuring quality of health and satisfaction with care; however, these items can be appended as a global statement in the questionnaire. In all, the FFI has been the most studied and widely used foot-specific self-reporting measure; however, further testing by gender, age, race, language, etc. would provide assurance of its generalizability.
FFI-R: The FFI-R was developed in response to criticism of the FFI and to address issues of contemporary interest. Most original items from the FFI were selected in the development of FFI-R, and new items about footwear and psychosocial factors were added, which improved its construct coverage. Patients and clinicians were involved in the generation of items. Its design closely followed the ICF theoretical model [13]; its psychometric properties are strong and are based on the IRT 1-parameter or the Rasch measurement model. It was designed to be a comprehensive measure of foot health-related quality of life, with both long and short forms [99], allowing clinicians and researchers to choose the measures they need for the intended study. Although the FFI-R did not include information on clinical ability to measure change in its development, Rao et al. [75,76] did measure the minimal detectible change (MDC) and the effect size, in individuals with midfoot arthritis, which also added to the credibility of its metrics.

Objective 4: The newly analyzed FFI-R with improved psychometric values
The full scale and short form For the FFI-R L (68 items) [11], person reliability was high: 0.96, respectively. In the PCA, 56.8% of the variance was explained by the measure, with only 10.6% of the variance explained by the first factor of residuals. These findings support that the full FFI-R meets the unidimensionality requirement of the Rasch model. Further, the criterion for unidimensionality was a ratio of the raw variance in the first contrast of residuals that was 5.4 (i.e., greater than 3). For the FFI-R S (34 items) [11], person reliability was 0.95, similar to the reliability estimates of the FFI-R L. The PCA of the FFI-R S revealed that unidimensionality criteria were also satisfied. This supports the use of a short form of the measure, because the item response burden on patients is lower, at 34 questions. Because this measure is as reliable as the full measure, its use is supported for clinical settings.

Subscales
All subscales of the FFI-R had strong person reliability estimates (Table 11), ranging from 0.78 to 0.94 for person reliability. The PCA indicated that unidimensionality held for each subscale, with the exception of the stiffness subscale. Further inspection of the data revealed that the twofactor solution reflected groups of the low-severity and high-severity items and was not the result of a competing factor. Unidimensionality for the limitation subscale was met after dropping item 41 (ASSISTO), an item listed in the FFI-R database. Overall, the subscales of the FFI-R satisfied unidimensionality criteria and were reliable measures of the latent traits (Table 11).

Response category analysis
The response category analyses for each of the subscales (done after collapsing Categories 5 and 6) revealed that, for the first three subscales (pain, stiffness, and difficulty), the response categories behaved as required by the Rasch model. However, for the subscales of limitation and social issues (both of which are time scales), there was some indication that respondents had difficulty distinguishing between, "2 = A little of the time," and, "3 = Some of the time." We considered, then, collapsing these categories and making all FFI-R subscales have four possible response categories. This would ensure uniformity of the measure and decrease the burden on patient response. Therefore, the first three subscales, which measure severity, "3 = Severe pain," "4 = Very severe pain," and "5 = Worst pain imaginable," were collapsed. This was justified because all three captured the notion of severe pain. Overall, the analyses showed that the response to each item functioned well with the fouritem response categories.

Discussion
This review evaluated 78 eligible articles (Figure 1). In the past 20 years, it appears that the FFI and FFI-R were widely used across national and international clinical and research communities. The instruments were administered to over 4700 study participants of males and females worldwide, across age groups, with 20 different diagnoses consisting of congenital, inflammatory/degenerative, acute and chronic foot and ankle problems. The FFI was also incorporated into other newer foot health measures [23,24], and also underwent changes in the measurement scale from VAS to Likert scale such as the one conducted by Agel et al. [25]. The scale changes also occurred in FFI adaptation to the Dutch [3], German [34], and Taiwanese Chinese [36] including our revised FFI-R [11] to give a few examples. The strong metrics of FFI subscales and full scale (Table 12, Category A), facilitated the investigator's choice to use its subscale(s) or full scale in clinical or research applications as appropriate. The FFI was also frequently used as validation criterion for other foot health measures (Table 12, Category B); this validation usage has elevated the credibility of the FFI as an outcome measure for foot and ankle problems. Since the FFI was developed using CTT procedures, it is sample and content dependent, therefore its metrics were tested in many different samples, where its metrics were proven to be consistently strong. The exception was in the study of Baumhauer et al. [32] where high foot functioning was evident in the sample; therefore, investigators should exercise caution in the interpretation of this result. While the FFI was developed initially as disease specific for early RA, in later years, it was used in many non-RA foot and ankle problems and was proven to be a valid measure as well. The FFI and FFI-R were frequently used as outcome measures in surgical and clinical interventions with positive results (Tables 7, 8, 9, and 10). The FFI scores were also used in many observational studies (Table 10) and those reports might be helpful for researchers and the health system administrators in establishing a health policy. Although the FFI was extensively studied and generally received positive ratings [23,29,102], we realized the need for improvement in the measures of FFI and FFI-R and have discussed this issues comprehensively under Objective 3 in this paper. We conducted a re-analysis and made improvements to the metrics and scales of FFI-R as presented in Table 11 and questionnaires FFI-R Long Form (See Additional file 1), and Short Form (See Additional file 2).
In recent articles about FFI used as outcome measures, the authors have included the clinical measures; the effect size, and standard response mean [64], and minimal important difference [101], while Rao et al. reports minimal detectible change and effect size of the FFI-R [75], all these have increased the credibility of the clinical use of the FFI to help in power analysis and sample size estimation for future studies.

Limitations of this review
Our literature search was limited to publications written in the English language and covered only publications until 2010; therefore, this might exclude the FFI-and FFI-R-related published articles not written in English, as well as those more recent articles published in English.