Computer Vision to Automatically Assess Infant Neuromotor Risk

Claire Chambers, Nidhi Seethapathi, Rachit Saluja, Helen Loeb, Samuel R Pierce, Daniel K Bogen, Laura Prosser, Michelle J Johnson, Konrad P Kording, Claire Chambers, Nidhi Seethapathi, Rachit Saluja, Helen Loeb, Samuel R Pierce, Daniel K Bogen, Laura Prosser, Michelle J Johnson, Konrad P Kording

Abstract

An infant's risk of developing neuromotor impairment is primarily assessed through visual examination by specialized clinicians. Therefore, many infants at risk for impairment go undetected, particularly in under-resourced environments. There is thus a need to develop automated, clinical assessments based on quantitative measures from widely-available sources, such as videos recorded on a mobile device. Here, we automatically extract body poses and movement kinematics from the videos of at-risk infants (N = 19). For each infant, we calculate how much they deviate from a group of healthy infants (N = 85 online videos) using a Naïve Gaussian Bayesian Surprise metric. After pre-registering our Bayesian Surprise calculations, we find that infants who are at high risk for impairments deviate considerably from the healthy group. Our simple method, provided as an open-source toolkit, thus shows promise as the basis for an automated and low-cost assessment of risk based on video recordings.

Figures

Fig. 1.. Flowchart of the pipeline for…
Fig. 1.. Flowchart of the pipeline for computer vision-based neuromotor risk assessment.
We created a normative database infant movement using videos found online (85 infants) and recorded infants at risk of neuromotor disease in a clinical setting (19 infants). Using video frames labelled with body-part landmarks from a subset of our video dataset, we adapted a pose estimator (OpenPose) to extract the pose of infants which we improved using domain adaptation. Using the adapted system, we then extracted pose from all videos. Next, from the pose data, we quantified kinematic features for each infant. Finally, our neuromotor risk prediction used Naïve Gaussian Bayesian Surprise that estimated the probability that each infant belonged to the reference population.
Fig. 2.. Infant testing at CHOP in…
Fig. 2.. Infant testing at CHOP in the PANDA gym.
The infant was placed at the center of a sensorized mat and was recorded using GoPro cameras.
Fig. 3.. Preprocessing of pose data.
Fig. 3.. Preprocessing of pose data.
We took raw pose data for whole videos as input (frame coordinates of body landmarks). To filter the pose data, we interpolated the raw signal to replace missing data, then applied a rolling-median filter to remove outliers and finally, used a rolling-mean filter. This provided a smooth signal from which to compute derivatives. To ensure that we could compare infants recorded under different conditions (camera angles, video resolution, etc.), we then rotated and normalized body landmark coordinates in each frame. We rotated the upper-body landmarks with respect to the center of the shoulders and we rotated the lower-body landmarks with respect to the center of the hips. Next, we normalized the landmark coordinates within each frame, by subtracting a reference landmark (the neck) and dividing by a reference distance (the trunk length). Finally, based on pre-processed signals, we computed kinematic variables from selected body landmark coordinates and joint angles (position or angle (y), velocity, acceleration).
Fig. 4. Pose-estimation model performance and example…
Fig. 4. Pose-estimation model performance and example outputs.
(A) Scatterplot of the RMSE of the adapted pose-estimation model as a function of the RMSE of the OpenPose model before domain adaptation. Points show the RMSE for individual images in bounding-box units. In A, C and E, the dotted line shows the diagonal, where performance before and after domain adaptation are equal. (B) Distribution of the difference in model error before and after domain adaptation. For each model, a single RMSE score was computed from errors between individual landmarks and labels averaged across the whole test dataset. RMSE (Adapted) – RMSE (Original) is shown by the red dotted line. Solid black line shows a difference of 0, in B, D and F. The negative RMSE difference demonstrates improvement after domain adaptation. (C) Scatterplot of the precision of the adapted pose-estimation model as a function of the precision before domain adaptation. Points show the precision for individual images. (D) Distribution of the difference in precision before and after domain adaptation. Precision (Adapted) – Precision (Original) is shown by the red dotted line. (E) Scatterplot of the recall of the adapted pose-estimation model as a function of the recall before domain adaptation. Points show the recall for individual images. (F) Distribution of the difference in recall before and after domain adaptation. Recall (Adapted) – Recall (Original) is shown by the red dotted line. (G) Example of OpenPose outputs extracted from a video of an infant using our adapted pose-estimation system. (H) Image y-coordinates of the extremities for the same infant as in (G).
Fig. 5.. Infant movement features.
Fig. 5.. Infant movement features.
Kinematic features of the reference sample (blue), low-risk infants (orange), moderate-risk infants (green), high-risk infants (red) as a function of age in corrected weeks. Features are shown for the wrists: median absolute position (l), IQR of position (l), median velocity (l/s), IQR of velocity (l/s), IQR of acceleration (l/s2), left-right cross-correlation of position and entropy of position. Visualization of other features are provided as Supporting Information at the figshare link.
Fig. 6.. Normalized Bayesian Surprise as a…
Fig. 6.. Normalized Bayesian Surprise as a function of subject group.
The normalized Bayesian Surprise (z) is shown for the reference infant population, and at-risk infants recorded at the lab evaluated by clinicians using the BINS score (Low Risk, Moderate Risk, and High Risk). More negative scores indicate a smaller probability of belonging to the reference population, or higher risk. Points show individual data for each subject group. Individual data is overlaid with the mean for each group (error bars = 95% confidence intervals (CI)).
Fig. 7.. SVD analysis of movement feature…
Fig. 7.. SVD analysis of movement feature data in terms of the three most important latent variables.
(A), (C), and (E) show values from singular vectors, which describe the infants in terms of latent variables. (B), (D), and (F) show the weighting of movement features in each latent variable. (A) Mean eccentricity (error bars = 95% CI) along the first singular vector (SV1 squared), as a function of participant group. (B) Weighting of movement features in SV1 ranked in descending order. (C) Mean SV2 squared (error bars = 95% CI) as a function of neuromotor risk. (D) The weighting of movement features in SV2 ranked in descending order. (E) Mean SV3 squared (error bars = 95% CI) as a function of neuromotor risk. (F) Weighting of movement features in SV3 ranked in descending order.
Fig. 8.. Age-constrained Normalized Bayesian Surprise as…
Fig. 8.. Age-constrained Normalized Bayesian Surprise as a function of subject group.
The normalized Bayesian Surprise (z) is shown for the reference infant population, and at-risk infants recorded at the lab evaluated by clinicians using the BINS score (Low Risk, Moderate Risk, and High Risk). In contrast to Figure 6, the comparison here for each infant is done only to infants who are within 10 weeks in age to explore any age effects on the results. More negative scores indicate a smaller probability of belonging to the reference population, or higher risk. Points show individual data for each subject group. Individual data is overlaid with the mean for each group (error bars = 95% confidence intervals (CI)).
Fig. 9.. Visualizing performance of the pose…
Fig. 9.. Visualizing performance of the pose estimation model as a function of video resolution.
In our data, we find that the performance of the domain-adapted pose estimation improves with video resolution. In this figure, we show how the RMSE, recall, and precision of the detected poses improve as a function of number of pixels in the original videos (video resolution). The figure above shows the mean and the 95% confidence interval of the mean.

Source: PubMed

3
Subskrybuj