Inferring mobility measures from GPS traces with missing data

Ian Barnett, Jukka-Pekka Onnela, Ian Barnett, Jukka-Pekka Onnela

Abstract

With increasing availability of smartphones with Global Positioning System (GPS) capabilities, large-scale studies relating individual-level mobility patterns to a wide variety of patient-centered outcomes, from mood disorders to surgical recovery, are becoming a reality. Similar past studies have been small in scale and have provided wearable GPS devices to subjects. These devices typically collect mobility traces continuously without significant gaps in the data, and consequently the problem of data missingness has been safely ignored. Leveraging subjects' own smartphones makes it possible to scale up and extend the duration of these types of studies, but at the same time introduces a substantial challenge: to preserve a smartphone's battery, GPS can be active only for a small portion of the time, frequently less than $10\%$, leading to a tremendous missing data problem. We introduce a principled statistical approach, based on weighted resampling of the observed data, to impute the missing mobility traces, which we then summarize using different mobility measures. We compare the strengths of our approach to linear interpolation (LI), a popular approach for dealing with missing data, both analytically and through simulation of missingness for empirical data. We conclude that our imputation approach better mirrors human mobility both theoretically and over a sample of GPS mobility traces from 182 individuals in the Geolife data set, where, relative to LI, imputation resulted in a 10-fold reduction in the error averaged across all mobility features.

Keywords: GPS; Imputation; Missing data; Mobility; Precision medicine; mHealth.

© The Author 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Figures

Fig. 1.
Fig. 1.
Theoretical unobserved trajectories and their surrogates. Trajectories are generated according to the theoretical model of Section 3.1. Panels (A), (C), and (E) represent a shorter period of missingness () while panels (B), (D), and (F) represent a longer period of missingness (). The solid lines represent the true unobserved and simulated trajectories over an interval of n units of time, and the dashed line represents its expected trajectory. It is assumed that the location immediately before and immediately after this interval are observed and known. The straight line represents LI as a means for imputing the missing gap. The represents the starting angle above the -axis of the mean trajectory. LI is best when the expected trajectory is a straight line, but only the simulation approach is robust to curvature.
Fig. 2.
Fig. 2.
Expected average gap between imputed trajectories and the true unobserved trajectory. The length of the interval of missingness, , ranges from to by increments of . For each value of , the average over simulations are used to estimate the average squared gap for the simulation approach. While for small and small LI can be a better approximation to the true trajectory, asymptotically the simulation approach is better for any amount of curvature.
Fig. 3.
Fig. 3.
A person’s daily trajectories over the course of a week. The bottom row represents a person’s trajectory when GPS is captured continuously. The top row represents the identical trajectories to the bottom row with emulated/simulated missingness, such that the GPS is assumed to be recorded only for 2-min intervals with 10-min gaps of missingness between recorded intervals. Lines represent flights, or movement. Points represent pauses, or periods where the person is stationary, with larger points indicating longer pauses.

Source: PubMed

3
Sottoscrivi