Extraction of Information from Inertial Sensors to Aid Health Assessment

............................................................................................................................................... II ACKNOWLEDGEMENT ....................................................................................................................... III

techniques can be used at each of the stages, and the appropriate ones are chosen by employing application-specific knowledge as the cornerstone.
I will be implementing the framework on two real world case studies; one studies the relationship between inertial features and gait pathology in Multiple Sclerosis and the other one deals with classification of head impacts in contact sports using inertial sensors. The information derived from both the case studies has the potential to serve as a tool to monitor the health of a person.
iii Acknowledgement First and foremost I offer my sincerest gratitude to my advisor, John Lach, who has supported me throughout my Masters' program with his patience and knowledge whilst allowing me the room to work in my own way. This thesis would not have been completed without his encouragement and effort. I simply could not wish for a better or friendlier advisor, and am grateful to him for taking me on as his student. I would also like to thank James Aylor and Laura Barnes for being a part of Masters' Thesis committee.

Introduction
Over the past decade, the use of mobile sensors has become an integral part of the modern lifestyle. With the advent of micro-electro-mechanical systems (MEMS), the physical size of sensors was reduced to the order of millimeters, thereby making them easier to integrate with systems. Systems are able to analyze situations and make decisions with help of sensors, thereby transforming them into smart systems.
Sensors are now employed in a wide range of applications from detecting DNA sequences to monitoring the environmental conditions on Mars.
One of the applications, where sensors have been immensely useful is fitness tracking of human subjects.
Over the past few years, kinematic measurements have been widely used by clinicians for medical diagnosis [1]- [5]. Such measures are used to quantify both normal and pathological movements, quantify the degree of impairment, plan rehabilitation strategies and assess the effect of various interventions.
Small, low-powered inertial sensors have helped with the collection of kinematic measurements from human motions and provided the potential for dynamic three-dimensional motion analysis. These sensors measure rigid kinematic motion, such as acceleration by accelerometers and angular velocity by gyroscope. They are currently used in real world scenarios to gather accurate and high resolution data corresponding to human motion.
Inertial sensors are used to study human kinematics for a wide variety of applications [1]. Lieber et al [4] described the role played by accelerometers in recording and predicting falls, recording and predicting freezing of gait, and evaluating postoperative recovery of subjects. Goodwin et al [6] developed a dynamical system using inertial sensors to study the spinal motion. Jamsa et al [7] showed that there exists a significant relationship between accelerometer-based data and proximal femur bone mineral density. Wong et al [8] monitored the human trunk movements using inertial sensors. Daukantas et al [9] utilized an accelerometer based sensor for extracting information related to the performance of a swimmer. Gong and Lach [10] employed inertial sensors to enhance the robotic surgical training process.
The location of the sensors and the information extracted from the inertial data are different across applications, but they share elements such as noise and human motion data. This noise may be due to improper placement of the sensor, infidelity of sensor recordings, unpredictable real-world scenarios or the large degree of freedom human action possesses. Thus, identifying subtle human movements and deriving corresponding information to address a problem is an arduous task. In order to tackle this issue a framework which incorporates domain-specific knowledge should be used.
A general framework for extraction of information from raw inertial data is already established for human activity recognition. There has been a lot of research done in this area and some of the highly cited papers in this domain are [11]- [15]. Randell et al [11] computed 4 features and employed neural networks to cluster the activities. Bao et al [13] used decision tree classifier with temporal features like mean, energy, frequency-domain entropy, and correlation of acceleration data as features to classify activities.
Laerhoven Van et al [12] employed an algorithm which takes the sensor data as input and generates a probability/confidence value for a particular activity. Similarly, Kern et al [14] exploited bayes classifier with the raw sensor data as input to recognize activities. Ward et al [15] used the number of peaks within a 100 ms sliding window, the mean amplitude of these peaks, and the raw x-axis data as features for a hidden Markov model to classify activities. All these papers tend to follow a common procedure for activity recognition, i.e. feature engineering followed by implementation of a classifier. The authors tend to do an exhaustive search within the available features and the classifiers to choose the right ones. But they do not discuss the reasons behind choosing the specific features or the specific classifier.
In this work, I will integrate application-specific knowledge into the framework to derive features and utilize machine learning techniques which are coherent with the problems statement. I will demonstrate the working of the framework and simultaneously implement it on two different case studies. The framework I utilize is a sequential learning process, where investigation follows a logical stepwise path to find solutions. The sequential learning framework (SLF) in my work consists of three steps; data preparation, feature engineering and information extraction. The SLF coupled with application-specific knowledge will address the two case studies mentioned below.

Case Study 1 Preface
Multiple sclerosis (MS) is a chronic progressive neurological disorder affecting about 2.5 million people globally [16]. Impaired mobility is a common symptom of MS even at lower levels of the disease and has significant negative effects on quality of life [17]. As a result, walking assessment is critical in tracking the progression of the MS disease.
There are various techniques to study human gait that are often quick and simple to use; however, such systems often lack valuable kinematic data. Tape measures and goniometers provide information in single planes and only for static positions. Electro-goniometers and inclinometers offer solutions for more than one plane, as well as provide dynamic data; however, the physical design of such sensors can restrict motion. Therefore, because of measurement drawbacks, it remains difficult for the clinician to gain information about dynamic three-dimensional movements. In contrast, laboratory systems are complex and expensive, but are capable of capturing high quality three-dimensional movements. Two laboratory systems commonly found within the literature to study gait are force plate systems [18]- [20] and videobased optoelectronic systems [2], [5], [21], but both the systems are time-consuming and complex [22].
Thus, employing inertial sensors is a better alternative to study gait with higher precision, compared to the aforementioned systems. Inertial sensors are portable, inexpensive and can record high resolution gait data. Figure 1 shows some of the inertial sensors that are used to study human gait. Recently, the gait impairment in MS subjects has been measured by performing the Six Minute Walk (6MW) Test. Under the 6MW, the subject is asked to walk for six minutes as swiftly as possible to identify motor fatigue. Chetta et al [23] focused on the cardiorespiratory response during the 6MW in 11 MS subjects with mild disability. The authors showed that 6MW distance correlated to disability score, but not subjective fatigue. Goldman et al [24] evaluated a modified 6MW in MS subjects with varying disability and controls to assess test reliability. The authors also assessed 6MW distance correlation to subjective measures of fatigue, physical function and ambulation. However, the correlation of 6MW distance was limited to certain physiological parameters, and there remains a need to identify other gait features from 6MW that may provide additional insight into gait impairment in MS. In order to address this issue, the motion of the subjects during the 6MW must be monitored with a higher precision.
The next chapters will explain the techniques that were incorporated into the sequential learning framework (SLF) to extract gait features from 6MW and determine their physiological significance.

Data
The gait and mobility of 115 participants were assessed at the University Of Virginia Department Of Neurology. There were 86 MS subjects and 29 controls. The subjects were asked to perform the 6MW in a hallway with an inertial sensor attached to their dominant hip. For this study, an off the shelf inertial sensor was used to collect the gait data, namely, Actigraph [25]. This inertial sensor has a triaxial accelerometer with a sampling frequency of 30 Hz. The clinicians recorded the distance covered by the subjects for each minute manually, which is the average walking speed for the corresponding minute. The clinicians also conducted three clinical tests, for every 6MW, which include patient-completed questionnaires and functional tests. The three clinical assessments were: 1. The Expanded Disability Status Scale (EDSS) [26] is a ten-point scale, where zero indicates a normal neurological exam and a ten indicates death from MS. This physician-assessed composite score is based on walking ability and neurological examination of seven functional systems: pyramidal, cerebellar, brainstem, sensory, vision, bowel and bladder, and cognitive.
2. The Multiple Sclerosis Walking Scale (MSWS) [27] is a self-reported measure of the impact of MS on walking ability consisting of 12 physical items. The total score ranges from 0 to 100, with higher values indicating greater impairment.
3. The Modified Fatigue Impact Scale (MFIS) [28] is another self-reported measure concerning the impact of fatigue on daily life. The total score is the sum of the scores for the 21 items, each ranging from 0 to 4. Individual subscale scores for physical, cognitive, and psychosocial functioning can also be generated by calculating the sum of specific sets of items.

Contribution
I derived metrics based on the subtle variations among the gait cycles across the six minutes and later established a relationship between the metrics and the gait pathology of the subject.

Case Study 2 Preface
Concussions * are common in contact sports such as football, soccer and lacrosse, with approximately 1.6 -3.8 million sport-related concussions reported in the USA annually [29]. In many cases, concussions are caused by a blow to the head or the head and upper body being violently shaken. Concussion symptoms include headache, lack of concentration, and loss of memory, judgment, balance, and/or coordination.
Every concussion incurs injures the brain to some extent [30]. Previously, research found statistically significant correlation between the number of head impacts and the resulting neurophysiological deficit [31]. Furthermore in [32], authors showed that repeated sub-concussive impacts have also been associated with blood-brain barrier disruption, indicating elevated risks for cognitive deficits. However, the exact etiology of concussion is unclear.
The identification of risk factors that predispose an athlete to concussion will help medical experts understand the underlying mechanisms of concussion and aid in the improvement of prevention strategies.
One way to identify the root cause of concussions is by identifying the relation between number of direct hits to the player's head and brain injury. In order to study this, there is a need to find the number of direct hits to the head (true hits) a player sustains during a game/season. A simple and straightforward way to check the number of true hits is by watching the game and manually counting the hits. But it requires a significant amount of human effort to monitor every player on field and keep track of the hits. So, an automated way to identify the true hits will be effective.
An inertial sensor can be used to collect motion data of players' heads and an automated algorithm can be implemented to differentiate the true hits from clacks † . There is no available work in the literature, to the best of the author's knowledge, which tried to identify the true hits from real world accelerometer data.
So, establishing a technique for detecting the true head impacts will serve as research tool to help investigate relationship between direct head impacts and concussions experienced by a player.
Upon investigating several temporal features, the estimation of stimuli * was regarded as the best feature to identify true hits. This is because the manner in which players react after sustaining a similar impact to the head can be quite varied. This reaction of the players depends on a several human factors like their age, physical fitness, impact preparedness, etc. Thus, the inertial data may not be the same if two players sustain the same hit. In this work, I will present an algorithm using SLF for estimating the stimuli of the impacts.

Data
An off the shelf inertial sensor, X2 Biopatch [33], was used to track and record the acceleration of the head of the players. The sensor was placed on the mastoid process part of the skull, rather than on a helmet [34]. A helmet does not provide a clear picture of the motion of the skull because of the various protective layers embedded in them. Placing a sensor on the mastoid process gives us more precise motion data of the skull and will not obstruct game play. Figure 2 shows the sensor placed on a person.
The sampling frequency of the sensor was 1000 Hz. The data collection study took place at Lynchburg College, where 16 players from Women Lacrosse team participated in the study. The sensor was used for 51 sessions, including games and practice sessions. The video footage was provided for some games and practice sessions, which was used to label the impacts as true hits and clacks. The X2 Biosystems software computed the following the metrics, which were available for analysis: 1. Peak linear acceleration (PLA) 2. Peak rotation acceleration (PRA)

Contribution
I have estimated the stimuli (that causes the head movement) of the recorded head impacts. Further, I built a classifier using this estimated signal to differentiate the true hits from clacks and compared the performance with other features.

Thesis Outline
Given the outlook and impact of this research, a summary of the thesis can be stated as follows:

This work utilizes a framework incorporated with application-specific knowledge to extract the meaningful information from raw inertial data corresponding to human motion. This framework is implemented on two medical case studies to derive information which can aid in health assessment.
The rest of this thesis is organized as follows: After the introduction of the framework and the case studies in Chapter 1, the steps taken to convert the raw data into usable data are presented in Chapter 2.
Then, Chapter 3 contains the techniques employed to extract the features in both the case studies. Chapter 4 studies the relationship between the derived features and the available ground truth using machine learning techniques. Chapter 5 will conclude the thesis and layout the future work.

Background
In order to analyze data for making predictions/classification, one of most important and strenuous step is data preparation. Data Preparation is the process of collecting, cleaning, and consolidating data into usable files or data tables for use in analysis. The process of preparing data generally entails correcting any errors (typically from human and/or machine input), filling in nulls and incomplete data, and transforming data into the desired format for analysis. The eventual output of the step is to have corrected data along with the corresponding ground truth.
Data preparation is a common step in big data analysis [35], web usage mining [36] and text mining [37].
Cooley et al [36] proposed data preparation techniques and algorithms to convert raw web server logs into user session files in order to perform web usage mining . The authors presented models to encode both the web site developers' and users' view of how a web site should be used to identify web site users, user sessions, and page accesses that are missing from a web server log.
Mclellan et al [37] employed data preparation techniques, developed for several Centers for Disease Control and Prevention, to organize textual data. The authors laid out a set of rules and considerations for managing text data.
There is far lesser focus on this step when dealing with human motion data. While handling inertial data, the biggest task lies in segmenting the required data and labelling them accordingly. This step plays an important role when the data collection happens in the real world as most often the ground truth is not available in the desired format.
In this chapter, for case study 1, the 6MW data was derived from the raw accelerometer data and the clinical scores were assigned to the respective 6MW test. For case study 2, the impacts recorded by the sensor were labelled as a true hit/clack for analysis.

Case Study 1
The accelerometer data, walking speed and clinical scores were manually surveyed to find missing data and other errors. Subjects with missing clinical scores and walking speed data were not included in the analysis. So that the clinical scores and walking speed associated with the accelerometer data, is available for analysis.
The first task was to extract the 6MW walking data from the accelerometer data. Since the sensors start recording the data even before the subjects start walking, it tends to collect data which is not related to the 6MW test. In order to extract the 6MW data from the accelerometer data, a GUI was developed in MATLAB ® to manually separate the region of interest. The user had to manually mark the start point and the end point of the 6MW using the cursor. Figure 3 displays the screenshot of the GUI. 6MW inertial data was extracted for all the subjects using this GUI. Once the 6MW accelerometer data was obtained, all the gait cycles had to be extracted to identify the subtle variations among gait cycles. Gait is comprised of sequential gait cycles and each such gait cycle is composed of a sequence of events that mark the transition from one gait phase to another. In terms of temporal domain parameters, the two most relevant events in a normal gait cycle are the initial heel contact or heel strike (HS) and terminal contact or toe off (TO). So detecting HS and TO accurately is of vital importance in clinical gait analysis to study the variation of gait patterns in high resolution. The gait patterns of the subjects in our study are quite variable, as shown in Figure 4, due to the heterogeneity of gait pathology in MS. In order to handle the wide variety of the gait patterns, our gait cycle extraction algorithm employs Fourier analysis to derive the gait cycles from the 6MW inertial data. Walking is predominantly a low frequency activity and, as such, Fourier coefficients representing high frequency signal content have very low amplitude. Most of the useful information closely related to the impact acceleration is contained in the band below 17 Hz [38]. Moreover, frequencies greater than 0.25 Hz enable us to effectively separate body acceleration (BA) and gravity acceleration (GA) components [39]. For our purpose, a bandpass filter with the frequency range 1 -3 Hz is applied to the 6MW accelerometer data. Figure 5 shows the frequency spectrum of the 6MW data after bandpass filtering. * Magnitude = √ 2 + 2 + 2 The maximum peak of the frequency spectrum was considered as the fundamental cycle frequency. This fundamental frequency is directly related to the average step rate of the six-minute walk. The gait length was calculated by dividing the sampling frequency of the accelerometer by the fundamental frequency of the 6MW. To reduce computational complexity, data segments of twice the corresponding gait length were extracted, with no overlap, from 6MW inertial data. Later, these segments of data were considered as the gait cycles. The gait cycles for each of minute from 6MW were obtained and stored. Figure 6 shows the gait cycles extracted using this technique. For measuring the variability between gait cycles, the cycles should be in-phase. In other words, the gait cycles should start from the same instance such as heel strike. But the drawback of this gait cycle extraction technique is that all the extracted gait cycles might not be in-phase. Some gait cycles may start with heel strike and others with toe off. This inconsistency across gait cycles is solved in chapter 3 by aligning the gait cycles using a phase invariant DTW method.

Case Study 2
For recording the head impacts in lacrosse, players were equipped with a X2 Biopatch sensor throughout the game. The sensor continuously reads the values from the accelerometer, and saves the data if the acceleration is above a threshold (i.e. 10g). The sensor saves the ten samples prior and eighty-nine footage for just 40 sessions was available. Labelling each of the impacts required manual effort due to inconsistent timing, poor video quality, absence of players in the video frame, etc. These issues were addressed through manual supervision, and finally achieved 252 labelled impacts which could be used for analysis. Out of the 252 impacts, 214 were true hits and 38 were clacks. Figure 7 shows an example of a true hit and clack * .
The stimulus for each of the labelled impacts is estimated in the next chapter using a linear dynamical model (LDM).

Background
A feature is a piece of information that can be used for prediction/classification. The features derived from the data will influence the results that are achieved from the predictive models. The process of using domain knowledge of the data to create attributes is called feature engineering. There are a number of feature extraction tools available in the literature which can be applied to inertial data.
In [40]- [42], authors derived time-domain features from inertial data which include the mean, median, variance, skewness, kurtosis and inter-quartile range. Aminian et al [43] computed the cross-correlation coefficients between accelerometer axis. Foerster et al [44] employed fast Fourier transform (FFT) to calculate the median of the spectral distribution, whereas Preece et al [45] used a subset of the FFT coefficients as features. Huynh et al [46] derived the spectral energy, which is the sum of the squared FFT coefficients, whereas Bao et al [13] computed the frequency-domain entropy, which is the normalized information entropy of the FFT components. In [38], [47], [48] authors used wavelet analysis, on inertial data, to calculate wavelet parameters based on the sum of the squares or RMS * of specific detail coefficients. Similarly, Wang et al [49] derived wavelet parameters using simple statistical measures, such as standard deviation and RMS, of specific approximation and detail coefficients.
In this chapter, for case study 1, features which measured the variability of the gait cycles across 6 minutes were computed. For case study 2, upon examining the temporal features derived from the inertial * RMS stands for Root Mean Square data and understanding the complexity of the problem, the stimuli of the impacts were estimated using a linear dynamical model.

Case Study 1
To study the subtle changes in gait cycles across the six minutes, the gait cycles from minutes 2 -6 were compared with the gait cycles of minute 1. The changes of gait patterns were studied for every subject, rather than comparing gait cycles across subjects as walking patterns vary amongst subjects. The most common way to compare two sequences is by computing the Euclidean distance. But Euclidean distance is ineffective when the sequences are temporally scaled. Thus to avoid this issue, we employed an alternative algorithm, dynamic time warping (DTW), to measure the similarity between two temporal sequences which may vary in speed. An example of showing the difference between the working of DTW and Euclidean distance is shown in Figure 8. Assume that the green and blue curves are two different sequences. The black lines show how the two distances compare the signals. Euclidean distance follows a linear temporal mapping between the two sequences, whereas DTW distance adopts a non-linear temporal mapping. DTW is commonly used to analyze temporal sequences of video, audio, and graphics data. A well-known application is in automatic speech recognition [50], to cope with different speaking speeds. The algorithm determines an optimal match between two given sequences and returns a distance measure known as the DTW distance. The sequences are warped nonlinearly in the time dimension to match each other as closely as possible. We have defined the number of warps to align both the sequences as the warping length.
Since the gait cycles from a particular 6MW were of the same length, let's consider P and Q to be two gait cycles of length n each.
where i p and i q are 3-dimensional vectors with x, y and z acceleration values respectively.
These two sequences P and Q were provided as inputs to the DTW algorithm and the outputs were the DTW distance between P and Q and the warped sequences w P and w Q , given by the following: where (s) k means k repetitions of the sample s.
The DTW distance is the Euclidean distance between w P and w Q , shown in (1). The warping length between the sequences was computed using the equation (2), The hurdle of misalignment of gait cycles, as mentioned in chapter 2, was tackled using a phase-invariant implementation of dynamic time warping (DTW). To make sure the gait cycles are in-phase, one of the gait cycles (either P or Q) was shifted in a cyclic manner till the best match was found, before running the DTW algorithm. Keogh lower bounding and early DTW termination were used to find a computational optimal solution [51].
For each subject, all the gait cycles from the first minute were matched with every other gait cycle from rest of the minutes using the DTW algorithm explained above. For instance, to compute the DTW score and Warp score for second minute, all the gait cycles from minute 2 were matched with all the cycles from minute 1. Now each gait cycle from minute 2 has a DTW distance and a warping length with respect to each gait cycle from minute 1. The least DTW distance and warping length were chosen for each gait cycle from minute 2. Then, the corresponding medians across all the chosen DTW distances and warping lengths are computed respectively and considered as "DTW score" and "Warp Score" related to minute 2.
The same approach was repeated for rest of the minutes. Following this procedure, the DTW score and Warp score for minutes 2 to 6 were computed with respect to first minute for every 6MW.
The gait features (DTW score and Warp score), the walking speed and the clinical scores of every 6MW were used for analysis in the next chapter to determine the physiological significance of the gait features.

Case Study 2
A number of linear and non-linear features * like mean, standard deviation, skew, energy, etc. were extracted from the impacts' data. Figure 9 shows the boxplots for a few of the features and there is high overlap between both classes (true hits/clacks). There was no distinction between the true hits/clacks using temporal features such as PLA, PRA, PLV, minimum and kurtosis of the acceleration, and mean and skew of the gradient of acceleration. One of the reasons for the lack of separation between the classes might be due to the dynamic environment on the field -the players' heads move rigorously while running, jumping, attacking, throwing and defending.
Though the true hits and clacks occur due to two different motives, the inertial data corresponding to the both of them possess similar characteristics. So it can be assumed that the human factors play a major role, in converting the driving forces into sensor recordings. Thus, the driving forces/stimuli * would be the key to differentiate the true hits from clacks.
Models like Hidden Markov Models (HMMs) [52] can be used to estimate the stimuli of the impacts but these models experience an exponential increase in parameters as more signal history is encoded. Thus dynamical models were used to estimate the stimuli. In this work, human motion analysis was viewed as a * A stimulus is the event that causes the head motion.
blind system identification where head motion is an unknown dynamical model driven by an unknown input. Intuitively, the dynamical model represents physical characteristics of an actor, such as mass and inertia, whereas the input represents the driving signal, a signature of the action.
The dynamical model used for estimation of stimuli from the inertial data has been inspired from Raptis' work [53]. The model, proposed in [53], falls into the class of linear dynamical models, where the task of motion modeling has been posed as a system identification problem under bounded energy and sparsity constraints [54]. The motivations to employ this model are two-fold. Firstly, the inertial data observed from human actions is non-stationary and is multivariate time series. This model accommodates such non-stationary and multivariate time series data. Secondly, the core hypothesis of this model is that multivariate time series ( ) are outputs of a linear time invariant dynamical system driven by a one dimensional sparse and bounded input ( ). But the human actions are non-linear and cannot be modelled by linear equations. Since the impacts recorded in our data were of just 0.1 seconds duration, the human actions were assumed to be linear in such small durations and abide the hypothesis of the model [55].
Thus, ( ) is considered as the stimulus for the impact ( ).
Generally, when the sensor records an impact with high acceleration values for a very few samples it corresponds to a direct impact to the head. Figure 7, shows an example of a true hit and clack. A good way to take advantage of these high acceleration values over short intervals is by computing the gradient of the signal.
So deriving the jerk signal from the raw inertial signal will amplify the sudden movements of the skull across all the directions. Thus, the jerk signal ( ) is considered as the output of the linear dynamical model, which is driven by ( ). The jerk signal is computed by taking the gradient of the raw accelerometer signal ( ), which is shown below: where , , stands for first order derivative of each of the axes x, y and z. Figure 11 displays the jerk signal of the accelerometer data shown in Figure 7. The linear dynamical model was defined by the system matrices , , and a state vector ( ). The model can be expressed as follows: where || || 0 is the number of nonzero elements in the sequence . One aspect of the model that plays an important role is bounding the input signal. This forces variables such as the amplitude of an action into the system matrices, thus resulting in inputs that are more comparable across individuals. An expectationmaximization (EM) algorithm [53] was applied to estimate the , , , and matrices under the assumption of system linearity and time invariance.
Initially, was randomly chosen abiding to the conditions in (3) and then , , , and were estimated.
Then the sequence was updated using the estimated , , , and matrices. This process iteratively continued until error between the estimated and the actual was reduced. Once the error converged, the input sequence was obtained, which is the stimulus for the reaction . Figure 12 displays an example of the estimated stimulus for an impact. The stimuli for all the verified impacts were computed using this linear dynamical model * approach. In the next chapter, a classifier was trained to differentiate the true hits from clacks using the estimated stimuli. The performance of the estimated stimuli was compared with several other features.
* The linear dynamical model was built in MATLAB ® R2013b (Intel i7-4770 CPU, 8GB RAM). The average computation time for each stimulus was 6.91 seconds. The algorithm is computationally expensive, so it will require additional constraints to implement on a smaller processor for real-time applications.

Background
Information, in simple words, is the answer to a question. Information extraction is the process of finding factual evidence, by identifying instances and relationships between entities, to solve a problem. This step in SLF focuses on modeling and knowledge discovery for predictive/classification purposes using the features derived from the earlier step. Information extraction exploits machine learning techniques and statistical analysis to build models that learn from and make predictions on the data. Machine learning techniques exploits algorithms that iteratively learn from data and allows computers to find hidden insights without being explicitly programmed where to look. These techniques are used for classification, regression, clustering, dimensionality reduction and density estimation.
Previously, several machine learning and statistical measures were exploited to understand relationships between human motion and inertial features. In [11], [41] and [56] authors employed neural networks to identify human actions like walking, standing, running, sitting, walking upstairs and downstairs. Jamsa et al [7] used pearson's correlation coefficients to study the association between acceleration peaks and bone mineral density changes. Wang et al [49] exploited multi-layer perceptron neural networks to differentiate walking patterns. Maurer et al [40] adopted online nearest neighborhood and linear discriminant analysis to differentiate human activities like walking, standing, running, sitting, ascending and descending. Kern et al [14] employed Bayes classifier to recognize activities like walking, sitting, standing, typing and writing. Herren et al [42] exploited multiple regression analysis to predict the incline and the speed of the runner. Parkka et al [57] utilized decision trees to identify rowing, cycling and more.
The type of information to be derived depends on the application itself. For my work, the extracted information will have to actionable and useful for the medical experts. This way, the information acts as a tool for the clinicians for assessing and tracking the health of subjects.
In this chapter, for case study 1, the correlation between DTW based scores and gait pathology was determined. For case study 2, a classifier was modelled to differentiate the true hits from the clacks using the estimated stimuli and was compared with other features.

Case Study 1
In this section, a model was developed to find the relation between the gait features, walking speed and the clinical scores. In order to show that the gait features are closely related with the clinical scores (MFIS, EDSS, MSWS) and significant than walking speed, three sets of six separate models were built.
These models were implemented using a least squares regression model. A linear regression model is an approach for modeling the relationship between a scalar dependent variable ( ) and one or more explanatory variables (or independent variables) denoted by . In my case, the dependent variable (or response variable) represented one of the clinical scores and the independent variables corresponded to the derived gait features or walking speed. Each set of models represented one of the clinical scores with six different combinations of DTW score, Warp score and walking speed. Table 1 shows all the different models.
A linear regression model can be represented as follows: where is the weights vector which has to be estimated using the and matrices. is the column vector with the clinical scores and is the matrix with the features. This method was followed to build the models ( ) in Table 1, where belongs to each of the clinical scores, i.e. EDSS, MSWS and MFIS and is the model number. The set of weights are chosen to minimize the sum of squared errors between the estimate of clinical score, ̂ , and the true clinical score, y. The ordinary least squares solution is given in (4), Once the weights were estimated for the all models in all the different sets, model performance was evaluated to determine the best model which explains the MS clinical scores. The following statistical performance metrics * were computed: Adjusted R squared, Akaike information criterion (AIC), Bayesian information criterion (BIC) and significance value. Table 2 shows these performance metrics for all the models. From Table 2, the linear regression model three models do not possess any patterns, which indicates that models abide the linearity and homoscedasticity assumptions. In the normal Q-Q plot for all the three models, the residuals are lined well with the dotted line, which is evidence that the distribution of observations is Gaussian. The residuals vs leverage plot shows that the cook's distance scores of all the observations is less than 1, suggesting that there are no influential points which can manipulate the regression model. All the diagnostic curves provide evidence that the models fit the data well.
The predictors of the best models, that fit the clinical scores, are a combination of walking speed, DTW score and Warp score. This suggests that the walking speed along with DTW score and Warp score perform better in all the three cases, when compared with walking speed alone. Thus, DTW Score and Warp score encompass certain domains of the MS clinical scores that walking speed does not. Now, to understand what DTW scores and Warp scores explain which walking speed does not, the direction of inference was reversed by evaluating the contribution of various clinical measures to the DTW Score, Warp Score, and walking speed. In other words, DTW score, Warp score and walking speed will act as the response variable and the clinical scores as the independent variables.
Three models were employed to analyze the physiological significance of our features. All models utilized stepwise regression, in which significant features were selected to best explain the dependent variable. Stepwise regression is an approach to select a subset of features for a regression model. It is a semi-automated process of building a model by successively adding or removing features based on the AIC performance metric of the model and the t-statistic of the feature. The focus of stepwise regression is to find the best combination of independent features to fit the dependent variable. Each of the MS clinical scores (EDSS, MFIS and MSWS) are composite scores of individual sub scores [26]- [28]. All these subscores of the MS clinical scores (EDSS, MFIS and MSWS) were the features to all the three models, whereas, the two gait features and the walking speed were the dependent variable for each of the models respectively. The model variables are mentioned in Table 3 where t is 1, 2, 3, 4, 5, 6 minutes.

EDSS + MFIS + MSWS
The technique shown in the flowchart in Figure 14 was exploited to find the clinical scores that are closely related to the gait features and the walking speed. The DTW score, Warp Score and walking speed for every minute was considered as a dependent variable for the stepwise regression model. The reason for considering each minute separately is because previously in [24], it was shown that the pattern of   Table 4 shows the significant clinical scores that best fit each response variable. The questionnaires and functional tests, that were conducted, are related to cognitive thinking, physical fatigue, balance, coordination, and other symptoms. Gait pathology can be related to muscle weakness, stiffness, and spasticity. For simplicity, all questions and functional tests have been segregated into 3 categories, namely, "Balance", "Physical Fatigue" and "Others". The category "Balance" relates to stability while walking, whereas "Physical Fatigue" relates to declining walking performance brought on by exertion.
Questions relating to cognitive fatigue were included in the "Others" category. The last column in Table 4 specifies the category corresponding to each clinical score.  Figure 15 shows the contributions of the three categories. This shows that walking speed has the highest contribution from the category "Others", including several symptoms of cognitive fatigue, whereas the DTW Score and Warp Score have the highest contribution from "Physical Fatigue" and "Balance" respectively. This shows that the walking speed relates to many diverse clinical variables, whereas the DTW Score and Warp Score are primarily a function of gait impairment.

Case Study 2
A classifier was trained to identify the true hits from the clacks. In machine learning, classification is the problem of identifying to which category a new observation belongs, on the basis of a training set of data containing observations (or instances) whose categories are known. Classification is considered an instance of supervised learning, i.e. learning from a set of correctly identified observations and labelling the unidentified observations. Two types of classifiers were implemented: This type of classifiers learns the probability distribution over the set of classes and uses that to make predictions.

 Non-Probabilistic Classifier
This type of classifiers does not attempt to model the underlying probability distributions, but tries to build the decision curve itself to make predictions.
One probabilistic classifier and two non-probabilistic classifiers were implemented, which can be found in Table 5.

Support Vector Machine
Logistic Regression The performance of the classifiers was validated using a variant of exhaustive cross validation called leave-one-out cross-validation (LOOCV). This cross validation technique uses one observation as the testing set and the remaining observations as the training set. This is repeated until all the observations have been used as the testing set. This validation technique was chosen due to the small dataset available.
It uses the available data more efficiently when compared to other cross validation methods, as only one observation is omitted at each step. All the above classifiers were trained using LOOCV with four different sets of features which are mentioned below:  The labels of the impacts were used as the response variable of the model.  Thus, these reduced set of features were used to train the classifiers.
Moreover, the feature set D has the best performance metric compared to the other feature sets, which suggests that it may be a good predictor of true hits.

Quadratic Discriminant Analysis
Quadratic Previously, Liu et al [58] used QDA in their algorithm to detect backward falls, prior to the impact, in elderly population. Palmerini et al [59] tested QDA to analyze postural instabilities in subjects suffering from Parkinson's disease using inertial data. Rebersek et al [60] adopted QDA to detect the onset of gait initiation, the first heel-off and the first toe-off. Ganea et al [61] utilized QDA to investigate the alteration of the gait pattern in children with Duchenne muscular dystrophy, using body-worn inertial sensors.
QDA, with the four feature sets, was implemented on the labelled impacts and the posterior probability for each of the test instances was calculated. Figure 16 shows the distribution of the posterior probabilities all the impacts. The posterior probabilities computed using QDA for set A, B and D lie near 0. For feature set C, the probabilities look spread out but a distinct separation between true hits and clacks cannot be identified with the naked eye. In order to compare the performance of the feature sets with QDA, ROC * curves were examined along with the area under the curves (AUROC * ), which are shown in Table 8.
The AUROC values for all the feature sets are lesser or nearly 0.5, which shows that these features with QDA are no better than random guessing. Also, the ROC curve is completely blue in color for feature sets A, B and D which shows that all the posterior probabilities for true hits and clacks lie near 0.
Overall the QDA performs poorly; it might be due to the overlap between the distributions of the two classes. For this reason, the non-probabilistic classifiers were employed to avoid the modelling of distributions of the classes.
* See Appendix B for definitions  Previously, [63]- [65] adopted SVM to identify human activities using inertial data. Patel et al [66] implemented SVM to estimate the severity of tremor, bradykinesia and dyskinesia using accelerometer data. Kumar et al [67] incorporated SVM into their technique to predict the National Institute of Health Stroke Scale (NIHSS) stroke index of post-acute stroke patients using inertial data. Shibuya et al [68] adopted SVM to build a real time fall detection system.
The SVM algorithm was trained and tested on the labelled dataset along with Platt scaling to obtain the posterior probabilities for all the impacts. Figure 17 shows the distribution of the posterior probabilities of the true hits and the clacks. For set C, the density plot is a sharp spike because the posterior probability values for all the test impacts are in the range of 0.96 -0.97. There is little or no separation between the true hits and clacks using the four feature sets, as shown in Figure 17. For this reason, the best feature set and the optimal threshold cannot be chosen by looking at the plots with the naked eye. In order to handle this issue, ROC * curves were examined along with the area under the curves (AUROC * ).
The ROC curves for all the feature sets and the AUROC metrics are shown in the Table 9. The AUROC value for feature set C is less than 0.5, which means that SVM classifier with this feature set was not good than random guessing. This was no surprise because the probability distributions of the true hits and the clacks were overlapping, as shown in Figure 17. The SVM classifier with set B has the highest AUROC value when compared with other feature sets. Thus, SVM classifier performed best with the feature set B.
The optimal threshold to classify the impacts is decided after examining the metrics from the logistic regression classifier.

Logistic Regression
Logistic regression predicts the probability of an outcome that can only have two values (i.e. a dichotomy). The difference between a linear regression and logistic regression is that the former assumes a linear relationship between the response variable and features, whereas the latter assumes a linear relationship between natural logarithm of odds of the response variable and the features. Figure 18 gives a pictorial view of the difference between linear and logistic regression decision curves. Since the number of classes is two, the logistic regression assumes that the instances have a Bernoulli distribution and produces a logistic curve, which is limited to values between 0 and 1. The logistic regression model can be represented as shown in (5), where , and refers to probability, th weights and th predictor variable.
Knighten et al [71] used logistic regression to identify social gestures of humans using wrist-worn smart band. Masse et al [72] adopted logistic regression and inertial sensors to improve postural transition recognition mobility-impaired stroke patients. Logistic regression was tested on all the feature sets and the posterior probabilities were computed for all the impacts. Figure 19 shows the probability distribution of true hits and clacks. The overlap between the distribution of true hits and clacks was highest for feature set C, when compared to feature sets A, B and D. The feature sets A, B and D have a similar distribution, where the true hits were accumulated near 1 and the clacks were evenly spread-out. But the feature set D had more true hits near 1 compared to A and B. The posterior probability distribution of the true hits, for the feature set D, is high near 1, whereas for the clacks it is spread out evenly from 0 through 1.For deciding the best feature set and the optimal threshold, the ROC curves were examined along with the AUROC values.
The ROC curves and the AUROC metrics generated for the available feature sets using a logistic regression are shown in Table 10.
Though the AUROC of set C with logistic regression was higher compared with SVM, it was still low when matched with AUROC values from other feature sets. So the feature set C with all classifiers (i.e. QDA, SVM and Logistic regression) performs poorly. The logistic regression along with set D has the highest AUROC.   Table 11 shows the AUROC values for all the combinations between the feature sets and the two nonprobabilistic classifiers. The feature set D with logistic regression is the best classifier, among all the possible combinations, with highest AUROC value.  The thresholds and the corresponding classification metrics for the 5 cases are shown in Table 12. Now, to choose the right threshold, the expertise of a clinician was required. As it is necessary to know what of the measures in Table 12 weigh higher in the clinical domain. After consulting a medical expert, case d was chosen as it is important not to have a large number of false negatives. In other words, it is important to identify most of the true hits at the cost of sometimes labelling clacks as hits as well. So, 0.63 was chosen as the optimal threshold to classify the true hits from clacks for the women lacrosse data; it achieves 90%, 55.26% and 85.31% as sensitivity, specificity and accuracy respectively.

Conclusion
A sequential learning framework (SLF) integrated with application specific knowledge works towards knowledge discovery at each step, which is an essential process to derive relevant information from raw inertial data. The information generated at each stage is passed through the framework and eventually builds a concrete solution to address the problem. All the three stages defined in the SLF have their own significance, as studied in both case studies. In case study 1, firstly, the gait cycles corresponding to each minute of the 6MW were extracted and the clinical scores were tagged to the respective 6MW. Secondly, DTW based metrics were computed based on the changes in gait cycles across six minutes. Finally, using linear regression and stepwise regression models it was shown that DTW-based scores are a primary representation of gait pathology. In case study 2, firstly, the impacts were labelled as true hit or clack with the help of the video footage. Secondly, the stimuli of the impacts were estimated using a linear dynamical model. Finally, a classifier was developed using the stimuli along with logistic regression to differentiate the true hits from clacks. All the steps, employed in the framework, refined and transformed the data in their respective way to provide meaningful information, which eventually addressed the problems. The information derived from both case studies has the potential to be used as a tool to monitor and assess the health of a person. The subsections below will summarize the information derived from the case studies.

Case Study 1
Walking speed, DTW Score, and Warp Score are closely related to several patient-reported and examdefined outcome measurements. The traditional objective measure, walking speed, has the best correlation with overall disability as it correlates with cognitive fatigue, bladder function, and other more general symptoms of disease progression. The DTW Score and Warp Score are solely representative of the gait pathology as DTW Score is primarily a measure of physical fatigue and, the Warp Score measures balance and physical fatigue. As a result, DTW-based features are a promising tool for gait assessment in other pathologies causing physical fatigue and/or poor balance. Further, predictive models can be built to estimate the gait pathology of a subject by analyzing their respective DTW Score and Warp Score. These new features initially intended for persons with MS can now be applied to a wide variety of gait-related disorders, giving care providers a precise new tool to quantify specific aspects walking impairment.

Case Study 2
The estimated stimulus is the best performing feature for differentiating the true hits from the clacks. This feature had better classification metrics than features such as original accelerometer signal, jerk signal and the temporal features. This shows that the stimulus, estimated using a linear dynamical model, provides us with a good discrimination between impacts. The rigorous motion of the players' heads causes the true hits and the clacks to have the same temporal characteristics even though the driving motives for true hits and clacks are different. For this reason, the stimuli perform better than the temporal features. Thus, the linear dynamical model has been able to successfully estimate the driving force using the inertial data.
Moreover, a logistic regression model with the estimated stimuli, as input, is the best classifier for differentiating the true hits from the clacks. This classifier achieves an overall accuracy of 85.31%, sensitivity of 90%, and specificity of 55.26% with 0.63 as a threshold. This classifier also provides a flexibility to manipulate the performance measures by changing the threshold value. If a higher specificity is required, a higher threshold can be chosen. The true hits identified by the classifier can be used to study the relation of direct hits to the head with concussions and brain injury. Furthermore, this classifier can be extended to identify true hits in other contact sports like soccer, football, etc. It also has the potential to be incorporated into on-field screening devices for real-time identification of true hits.

Future Work
The framework presented in this work can be incorporated into a wide range of sensor-based applications for information extraction. However, there are a few aspects of this work, which can be revisited, such as building data preparation standards and evaluating the framework itself. Firstly, standards for data preparation will reduce time and effort for cleaning and labelling the raw data. This will also make sharing data easier and algorithms can be implemented across datasets seamlessly. A few precautionary steps that should be taken during data collection procedure, which will help reduce effort during data preparation, are time synchronization amongst modules, reliable calibration of the sensors, and consistency while recording values manually. Secondly, standards for framework evaluation are required because currently, the correlation between the extracted information and the clinical scores is used as a metric to evaluate the framework. However, this new information may provide a new dimensional solution to the problem which the clinical scores may not consider. Also, "gold standard" clinical scores are not available for most of the health applications. So having a standard to evaluate frameworks would be a good way to judge the credibility of the information.
The methodology built for case study 1 can be extended to other medical domains where gait pathology is a symptom such as rehabilitation, Parkinson's disease, cerebellar ataxia and many others. The DTW based metrics can be tested in these domains to understand gait disorders. Currently MS subjects visit the hospital every six months for disability assessment. However, this methodology will enable care providers to remotely monitor gait pathology on an ongoing basis.
The approach of case study 2 can be implemented across other sports such as soccer and football, so that every true hit a player sustains during any activity is accounted. This way the relation between true hits and brain trauma injury can be analyzed. Upon understanding the relationship, predictive models can also be built to foresee the brain health of players before games. This will also make sure that players' are able to recover from brain injuries between games.

Akaike information criterion (AIC)
AIC is a measure of the relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the performance of each model, relative to each of the other models.
Hence, AIC provides a means for model selection.

= −2. ln( ) + 2
where L is the maximum of the likelihood function for the model and is number of estimated parameters in the model.
The model with least AIC is preferred. AIC rewards the goodness of fit (as assessed by the likelihood function), but it also includes a penalty that increases with the number of estimated parameters.

Bayesian information criterion (BIC)
BIC is a criterion for model selection among a finite set of models; the model with the lowest BIC is preferred. It is closely related to AIC, but BIC penalizes the complexity of the model more.

= −2. ln( ) + . ln ( )
where L is the maximum of the likelihood function for the model, is number of estimated parameters in the model and n is the number of observations.
The BIC generally penalizes free parameters more strongly than the AIC. Lower BIC implies either fewer explanatory variables, better fit, or both.

p value
The p-value is used for null hypothesis testing. The p value is the probability of finding the observed results when the null hypothesis is true. A null hypothesis proposes that there is no relation between the response variable and features. So, if the p value is less than a significant value it means that the null hypothesis can be rejected. In other words, it says that there exists a relationship between the features and the response variable.

B. Classification Metrics Sensitivity
Sensitivity is a statistical measure of the performance for a binary classifier test. It refers to the test's ability to correctly detect the positive cases. In the example of a medical test used to identify a disease, the sensitivity of the test is the proportion of people who test positive for the disease among those who have the disease.

Specificity
Specificity is a statistical measure of the performance for a binary classifier test. It relates to the test's ability to correctly detect the negative cases. Consider the example of a medical test for diagnosing a disease. Specificity of a test is the proportion of identified true negatives to the total number of healthy patients known not to have the disease.

ROC & AUROC
Receiver Operating Characteristic (ROC) curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity; and the false-positive rate is the fall-out (i.e. 1specificity). The ROC curve is thus the sensitivity as a function of fall-out.
Area under the Receiver Operative Characteristic (AUROC) curve is a measure that is related to the accuracy of a classifier. AUROC is the area under the curve in a ROC plot. An area of 1 represents a perfect classifier; an area of 0.5 represents random guessing.

C. Diagnostic Curves
Diagnostic curves are used to evaluate the model assumptions and investigate whether or not there are observations with a large, undue influence on the analysis.

Residuals vs Fitted
This plot shows if residuals have non-linear patterns. There could be a non-linear relationship between predictor variables and an outcome variable and the pattern could be visible in this plot. If there are equally spread residuals around a horizontal line without distinct patterns, it indicates that they do not have non-linear relationships.

Normal Q-Q
The normal Q-Q plot, or quantile-quantile plot, is a graphical tool used to assess if a set of data plausibly came from a normal distribution. A Q-Q plot is a scatterplot created by plotting two sets of quantiles against one another. If both sets of quantiles came from the same distribution, the points form a line that is marginally straight.

Scale-Location
This plot shows if the residuals are spread equally along the ranges of predictors. This is used to check the assumption of equal variance (homoscedasticity). The model is homoscedastic if the points are equally (randomly) spread out in the plot.

Residuals vs Leverage
This plot is used to identify the influential observations. These observations could be very influential in manipulating the model even when they contain reasonable range of the values. These observations can alter the results if excluded from analysis.

D. Statistical measures of the clinical scores
Initially, the t and p statistics of the clinical scores from the stepwise regression model were calculated.
Then the absolute average of t values and the average of p values, corresponding to the frequently occurring clinical scores across the stepwise regression models, were computed. A clinical score with a higher t statistic has a stronger relationship with the gait feature. The absolute average of the t statistic for all the clinical scores, mentioned in Table 13, are approximately equal to or higher than 2. The corresponding p values were derived to identify the "strength" of the relationship between the clinical scores and the respective gait features. If the p value is less than 0.05, there is more than 95 % chance that a statistically significant relationship exists between the clinical score and the gait feature. The average p values of the clinical scores shown in Table 13 are nearly equal to or less than 0.05, which provides evidence that the clinical scores are related to the respective gait features.