Dynamic 3D shape of the plantar surface of the foot using coded structured light: a technical report

Background The foot provides a crucial contribution to the balance and stability of the musculoskeletal system, and accurate foot measurements are important in applications such as designing custom insoles/footwear. With better understanding of the dynamic behavior of the foot, dynamic foot reconstruction techniques are surfacing as useful ways to properly measure the shape of the foot. This paper presents a novel design and implementation of a structured-light prototype system providing dense three dimensional (3D) measurements of the foot in motion. The input to the system is a video sequence of a foot during a single step; the output is a 3D reconstruction of the plantar surface of the foot for each frame of the input. Methods Engineering and clinical tests were carried out to test the accuracy and repeatability of the system. Accuracy experiments involved imaging a planar surface from different orientations and elevations and measuring the fitting errors of the data to a plane. Repeatability experiments were done using reconstructions from 27 different subjects, where for each one both right and left feet were reconstructed in static and dynamic conditions over two different days. Results The static accuracy of the system was found to be 0.3 mm with planar test objects. In tests with real feet, the system proved repeatable, with reconstruction differences between trials one week apart averaging 2.4 mm (static case) and 2.8 mm (dynamic case). Conclusion The results obtained in the experiments show positive accuracy and repeatability results when compared to current literature. The design also shows to be superior to the systems available in the literature in several factors. Further studies need to be done to quantify the reliability of the system in clinical environments.

The calibration target is composed of two perpendicular planes, each containing 20 equally sized squares, each 20 mm. The spacing between squares is also 20 mm. The origin of the global reference frame is at the bottom intersection of the two planes. The global co-ordinates of the corners of the 3D squares and the image co-ordinates of their corresponding image points are used to estimate the calibration parameters. We adopted the Faugeras-Toscani calibration algorithm in 4DFRS [4].
Projector parameters are calibrated after camera parameters. A grid pattern is projected onto the calibration pattern, in the same position used to calibrate the camera. The grid corners and their images ( Figure 1) are used as calibration points. Taking into account the known geometry of the calibration object, and using the camera calibration parameters, the 3D locations of the grid intersections can be computed by triangulation. Once these are known, the Faugeras-Toscani algorithm can be used to estimate the calibration parameters of the projector [3].

Sequence acquisition
A comprehensive review of CSL techniques by Salvi et al. [5] shows that techniques using spatial neighborhood coding are the most suitable for dynamic surface reconstruction [6][7][8][9][10]. The 4DFRS uses the simple yet accurate approach by Pages et al. [3], i.e., spatial-coded/peak-based structured light. Figure 1 illustrates the pattern used in the foot reconstruction system. It consists of 64 colored stripes with black bands between each pair of consecutive stripes. The arrangement of stripes is based on a De Bruijn sequence of four colors and window property of three, meaning that any three consecutive stripes form a unique color sequence within the pattern. This enables robust correspondence estimation. The pattern illuminates the foot surface taking a step as the video camera records a sequence of frames. These frames are then processed sequentially to reconstruct the shape of the foot surface. Since all the information needed for reconstruction is encoded in one pattern, it is possible to reconstruct the plantar surface of the foot at every image, and therefore obtain full camera frame rates.

3D reconstruction
Correspondences between the image stripes and the original pattern are calculated. This process occurs in two stages. Stage one consists of locating the centers of the colored stripes on the captured image; stage two compares the segmented stripes with the coded pattern in order to determine correspondences.
The center of each stripe is located row by row. At each pixel, a positive function g(i) is computed, defined by: where i is the index of the pixel on a given row, R (resp. G, B) is the red (resp. green, blue) channel of the image, and d indicates the following filter applied to the monochromatic rows: The parameter o is the spatial width of the filter. With an ideal top-hat signal, df is maximum and positive at the rising edge of the signal, zero at the center, and minimum and negative at the falling edge.
2 Consequently, the maxima of g identify the stripe edges. Stripe centers are located with subpixel accuracy as the normalized centroid of the non-black segments between two maxima [11].
Following Zhang et al. [10], the correspondence between image stripes and pattern stripes was solved using single-pass dynamic programming, a well-established approach in solving the correspondence problem in structured-light systems [3,12,13].
The 3D co-ordinates of surface points are calculated by triangulation. The back-projection ray through the center of the camera reference frame and a stripe pixel is intersected with the plane defined by the center of the projector reference frame and the pattern stripe corresponding to the pixel.