Cross-validation and out-of-sample testing of physical activity intensity predictions with a wrist-worn accelerometer

Alexander H K Montoye, Bradford S Westgate, Morgan R Fonley, Karin A Pfeiffer, Alexander H K Montoye, Bradford S Westgate, Morgan R Fonley, Karin A Pfeiffer

Abstract

Wrist-worn accelerometers are gaining popularity for measurement of physical activity. However, few methods for predicting physical activity intensity from wrist-worn accelerometer data have been tested on data not used to create the methods (out-of-sample data). This study utilized two previously collected data sets [Ball State University (BSU) and Michigan State University (MSU)] in which participants wore a GENEActiv accelerometer on the left wrist while performing sedentary, lifestyle, ambulatory, and exercise activities in simulated free-living settings. Activity intensity was determined via direct observation. Four machine learning models (plus 2 combination methods) and six feature sets were used to predict activity intensity (30-s intervals) with the accelerometer data. Leave-one-out cross-validation and out-of-sample testing were performed to evaluate accuracy in activity intensity prediction, and classification accuracies were used to determine differences among feature sets and machine learning models. In out-of-sample testing, the random forest model (77.3-78.5%) had higher accuracy than other machine learning models (70.9-76.4%) and accuracy similar to combination methods (77.0-77.9%). Feature sets utilizing frequency-domain features had improved accuracy over other feature sets in leave-one-out cross-validation (92.6-92.8% vs. 87.8-91.9% in MSU data set; 79.3-80.2% vs. 76.7-78.4% in BSU data set) but similar or worse accuracy in out-of-sample testing (74.0-77.4% vs. 74.1-79.1% in MSU data set; 76.1-77.0% vs. 75.5-77.3% in BSU data set). All machine learning models outperformed the euclidean norm minus one/GGIR method in out-of-sample testing (69.5-78.5% vs. 53.6-70.6%). From these results, we recommend out-of-sample testing to confirm generalizability of machine learning models. Additionally, random forest models and feature sets with only time-domain features provided the best accuracy for activity intensity prediction from a wrist-worn accelerometer. NEW & NOTEWORTHY This study includes in-sample and out-of-sample cross-validation of an alternate method for deriving meaningful physical activity outcomes from accelerometer data collected with a wrist-worn accelerometer. This method uses machine learning to directly predict activity intensity. By so doing, this study provides a classification model that may avoid high errors present with energy expenditure prediction while still allowing researchers to assess adherence to physical activity guidelines.

Keywords: GENEActiv; artificial neural network; decision tree; random forest; support vector machine.

Source: PubMed

3
S'abonner