Human Activity Classification based on Smartphone Sensor Signals
Classify using supervised learning (FULL script)
Contents
- Supervised Learning
- Clear all variables that are not relevant & load pre-saved variables
- Data preparation
- Statistical Feature Importance (parametric)
- Train Classification Tree
- Train Multiclass SVM
- Train k-Nearest Neighbor
- Set-Up & Train Naive Bayes
- Set-Up & Train neural network
- Train network on reduced feature set (Optionnal)
- Compare Models
- Save models
Supervised Learning
In this section we will apply a supervised learning approach (outputs are known) and iterate using different Supervised Learning algorithms
Clear all variables that are not relevant & load pre-saved variables
Clear nonrelevant variables
clear; clc % Load set of feature vectors (feat) and cell array of feature names % (featlabels) load('Data\Prepared_iPhone_32\BufferFeatures60.mat') % Load buffered signals (here only using known activity IDs for buffers) load('Data\Prepared_iPhone_32\BufferedAccelerations.mat')
Data preparation
Activities = categorical(actid,(1:numel(actnames)),actnames); % Cross Validation cv = cvpartition(length(actid),'holdout',0.1); % 10% size for Test % Training set Xtrain = feat(training(cv),:); Ytrain = Activities(training(cv)); % Test set Xtest = feat(test(cv),:); Ytest = Activities(test(cv)); Dataset_train = [array2table(Xtrain) table(Ytrain)]; Dataset_test = array2table(Xtest); Dataset_train.Properties.VariableNames = [featlabels' 'Activities']; Dataset_test.Properties.VariableNames = featlabels; % View some stats on some features grpstats(Dataset_train,'Activities','mean','Datavars',featlabels(1:6))
ans = Activities GroupCount mean_TotalAccXMean mean_TotalAccYMean mean_TotalAccZMean mean_BodyAccXRMS mean_BodyAccYRMS mean_BodyAccZRMS __________ __________ __________________ __________________ __________________ ________________ ________________ ________________ Sitting Sitting 5260 -3.2669 -3.1347 7.4826 0.044261 0.044248 0.033736 Standing Standing 5589 0.30591 -4.4346 0.0045702 0.06581 0.038985 0.062751 Walking Walking 4872 0.014923 -1.3015 0.58783 2.3559 3.3679 2.7939 Running Running 3555 0.56194 -10.157 1.4732 5.4044 11.397 5.9691 Dancing Dancing 2392 0.56227 -9.5047 1.7717 3.7561 12.895 4.415
Statistical Feature Importance (parametric)
The complexity of the model and its likelihood to overfit can be reduced by reducing the number of features included. A metric like a paired t test can be used to identify the top features that help separate classes. We can then investigate later whether a model trained using only these features can perform comparably with a model trained on the entire dataset
[featIdx, pair] = featureScore(feat, actid, 10); disp('Identified Discriminative features (using paired t-test):'); for i = 1:size(pair,1) fprintf('%d and %d: ', pair(i,1), pair(i,2)); fprintf('%s, ', featlabels{featIdx(i,:)}); fprintf('\n'); end % Build digraph showcasing the 5 most important features which dissociate activities [connections,nodeOutmap,nodeFeatmap] = build_connections(pair,featIdx); ImportantFeat = cellstr(categorical(unique(featIdx),(1:numel(featlabels)),featlabels)); G = table([connections(:,1) connections(:,2)],ones(size(connections,1),1),repmat({'t-test Interaction'},size(connections,1),1),... 'VariableNames',{'EndNodes' 'Weight' 'Code'}); G = graph(G,table([actnames';ImportantFeat],'VariableNames',{'Name'})); colormap hsv nColors = degree(G); nSizes = 6*sqrt(nColors-min(nColors)+0.2); plot(G,'MarkerSize',nSizes,'NodeCData',nColors,'EdgeAlpha',0.1,'Layout','force'); set(gca,'XColor','w','YColor','w'); box off
Identified Discriminative features (using paired t-test): 1 and 2: TotalAccZMean, TotalAccXMean, BodyAccZSpectPos2, BodyAccZSpectPos3, BodyAccZSpectPos4, BodyAccZSpectPos5, BodyAccZSpectPos1, BodyAccZSpectPos6, BodyAccZCovFirstPos, BodyAccXSpectPos2, 1 and 3: BodyAccZRMS, BodyAccXRMS, BodyAccYRMS, TotalAccZMean, BodyAccZPowerBand1, BodyAccZCovZeroValue, BodyAccXCovZeroValue, BodyAccXPowerBand2, BodyAccZPowerBand2, BodyAccYCovZeroValue, 1 and 4: BodyAccYRMS, BodyAccXRMS, BodyAccZRMS, BodyAccYCovZeroValue, BodyAccYPowerBand2, BodyAccXCovZeroValue, TotalAccYMean, BodyAccXPowerBand2, TotalAccZMean, BodyAccZCovZeroValue, 1 and 5: BodyAccYRMS, BodyAccXRMS, BodyAccYPowerBand2, BodyAccYCovZeroValue, BodyAccZRMS, BodyAccYCovFirstValue, BodyAccXPowerBand2, TotalAccZMean, TotalAccYMean, BodyAccXCovZeroValue, 2 and 3: BodyAccZRMS, BodyAccXRMS, BodyAccYRMS, BodyAccZCovZeroValue, BodyAccZPowerBand1, BodyAccXCovZeroValue, BodyAccXPowerBand2, BodyAccZPowerBand2, BodyAccYCovZeroValue, BodyAccYPowerBand2, 2 and 4: BodyAccYRMS, BodyAccXRMS, BodyAccZRMS, BodyAccYCovZeroValue, BodyAccYPowerBand2, BodyAccXCovZeroValue, BodyAccXPowerBand2, BodyAccZCovZeroValue, BodyAccZPowerBand2, BodyAccYCovFirstValue, 2 and 5: BodyAccYRMS, BodyAccXRMS, BodyAccYPowerBand2, BodyAccYCovZeroValue, BodyAccYCovFirstValue, BodyAccZRMS, BodyAccXPowerBand2, BodyAccXCovZeroValue, BodyAccXSpectVal6, BodyAccYSpectVal6, 3 and 4: BodyAccYRMS, BodyAccXRMS, BodyAccYPowerBand2, BodyAccYCovZeroValue, BodyAccXCovZeroValue, BodyAccZRMS, BodyAccXPowerBand2, BodyAccYCovFirstValue, BodyAccZPowerBand2, BodyAccZCovZeroValue, 3 and 5: BodyAccYRMS, BodyAccYPowerBand2, BodyAccYCovZeroValue, BodyAccYCovFirstValue, BodyAccYSpectVal6, BodyAccXRMS, TotalAccZMean, BodyAccXCovZeroValue, BodyAccXPowerBand2, BodyAccYSpectVal3, 4 and 5: BodyAccXRMS, BodyAccYCovFirstPos, BodyAccXPowerBand2, BodyAccXCovZeroValue, BodyAccYCovFirstValue, BodyAccZRMS, BodyAccXSpectVal6, BodyAccXSpectVal3, BodyAccXSpectVal4, BodyAccXSpectVal5,
Train Classification Tree
tic myTree = fitctree(Dataset_train,'Activities','SplitCriterion', 'gdi', ... 'MaxNumSplits', 20, 'Surrogate', 'off'); toc % Predict on Training & Test sets Y_CT_train = predict(myTree,Dataset_train); Y_CT_test = predict(myTree,Dataset_test); view(myTree,'mode','graph') % Calculate confusion matrices using prediction results C_CT = confusionmat(Ytest,Y_CT_test); C_CT_train = confusionmat(Ytrain,Y_CT_train);
Elapsed time is 2.761747 seconds.
Train Multiclass SVM
templateSVM = templateSVM('KernelFunction', 'linear', 'PolynomialOrder', [], ... 'KernelScale', 'auto', 'BoxConstraint', 1, 'Standardize', true); tic mySVM = fitcecoc(Dataset_train,'Activities','Learners', templateSVM, ... 'Coding', 'onevsall'); toc Y_SVM_train = predict(mySVM,Dataset_train); Y_SVM_test = predict(mySVM,Dataset_test); C_SVM = confusionmat(Ytest,Y_SVM_test); C_SVM_train = confusionmat(Ytrain,Y_SVM_train);
Elapsed time is 47.138819 seconds.
Train k-Nearest Neighbor
tic myKNN = fitcknn(Dataset_train,'Activities','Distance', 'Hamming','Exponent',[],... 'NumNeighbors', 10, 'DistanceWeight', 'Inverse', 'Standardize', false); toc Y_KNN_train = predict(myKNN,Dataset_train); Y_KNN_test = predict(myKNN,Dataset_test); C_KNN = confusionmat(Ytest,Y_KNN_test); C_KNN_train = confusionmat(Ytrain,Y_KNN_train);
Elapsed time is 0.146296 seconds.
Set-Up & Train Naive Bayes
tic myNB = fitcnb(Dataset_train,'Activities','Distribution','normal'); toc % Predict on Training & Test sets Y_NB_train = predict(myNB,Dataset_train); Y_NB_test = predict(myNB,Dataset_test); C_NB = confusionmat(Ytest,Y_NB_test); C_NB_train = confusionmat(Ytrain,Y_NB_train);
Elapsed time is 43.948930 seconds.
Set-Up & Train neural network
Create Dummy Variable Groups
Ytrain_bingrps = dummyvar(Ytrain); % Reset random number generators (for repetability) rng default % Initialize a Neural Network with 15 nodes in hidden layer net = patternnet(15); net.divideParam.trainRatio = 90/100; net.divideParam.valRatio = 5/100; net.divideParam.testRatio = 5/100; % net.trainFcn = 'trainbr'; % Bayesian Regularization backpropagation. p = gcp('nocreate'); if isempty(p) p = parpool('local'); end % For details about customizing the training function refer to the % web(fullfile(docroot, 'nnet/ug/choose-a-multilayer-neural-network-training-function.html')) tic net = train(net, Xtrain', Ytrain_bingrps','UseParallel','always'); toc outputs_train = net(Xtrain'); % Predict on Training set [~, Y_NN_train] = max(outputs_train,[],1); Y_NN_train = categorical(Y_NN_train,1:length(actnames),actnames)'; % Predict on Test set outputs_test = net(Xtest'); [~, Y_NN_test] = max(outputs_test,[],1); Y_NN_test = categorical(Y_NN_test,1:length(actnames),actnames)'; C_NN = confusionmat(Ytest,Y_NN_test); C_NN_train = confusionmat(Ytrain,Y_NN_train);
Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers. Elapsed time is 17.427224 seconds.
Train network on reduced feature set (Optionnal)
netRed = patternnet(15); % Initialize network netRed.divideParam.trainRatio = 90/100; netRed.divideParam.valRatio = 5/100; netRed.divideParam.testRatio = 5/100; redF = unique(featIdx(:)); XtrainRed = Xtrain(:,redF); XtestRed = Xtest(:,redF); tic netRed = train(netRed, XtrainRed', Ytrain_bingrps','UseParallel','always'); toc outputs_train_red = netRed(XtrainRed'); [~, Y_NN_train_red] = max(outputs_train_red,[],1); Y_NN_train_red = categorical(Y_NN_train_red,1:length(actnames),actnames)'; outputs_test_red = netRed(XtestRed'); [~, Y_NN_test_red] = max(outputs_test_red,[],1); Y_NN_test_red = categorical(Y_NN_test_red,1:length(actnames),actnames)'; C_NN_red = confusionmat(Ytest,Y_NN_test_red); C_NN_red_train = confusionmat(Ytrain,Y_NN_train_red);
Elapsed time is 11.243019 seconds.
Compare Models
The ultimate prediction performance can be represented visually in a number of different ways. Below we present the confusion matrix. The confusion matrix is a square matrix that summarizes the cumulative prediction results for all couplings between actual and predicted classes, respectively & the overall misclassification rates on both test & training sets
% Overall misclassification rates perfTest(1) = 100 - sum(sum(diag(C_CT)))/sum(sum(C_CT))*100; perfTest(2) = 100 - sum(sum(diag(C_SVM)))/sum(sum(C_SVM))*100; perfTest(3) = 100 - sum(sum(diag(C_KNN)))/sum(sum(C_KNN))*100; perfTest(4) = 100 - sum(sum(diag(C_NB)))/sum(sum(C_NB))*100; perfTest(5) = 100 - sum(sum(diag(C_NN)))/sum(sum(C_NN))*100; perfTest(6) = 100 - sum(sum(diag(C_NN_red)))/sum(sum(C_NN_red))*100; perfTrain(1) = 100 - sum(sum(diag(C_CT_train)))/sum(sum(C_CT_train))*100; perfTrain(2) = 100 - sum(sum(diag(C_SVM_train)))/sum(sum(C_SVM_train))*100; perfTrain(3) = 100 - sum(sum(diag(C_KNN_train)))/sum(sum(C_KNN_train))*100; perfTrain(4) = 100 - sum(sum(diag(C_NB_train)))/sum(sum(C_NB_train))*100; perfTrain(5) = 100 - sum(sum(diag(C_NN_train)))/sum(sum(C_NN_train))*100; perfTrain(6) = 100 - sum(sum(diag(C_NN_red_train)))/sum(sum(C_NN_red_train))*100; figure bar([perfTrain;perfTest]'); set(gca,'XTickLabel', {'Trees', 'SVM', 'kNN', 'Naive Bayes','NN','Reduced NN'}); legend('Training','Test') ylabel('Misclassification Rates(%)'); title('Comparison of models on training & test set'); % Confusion Matrices on Test sets figure subplot(2,3,1); heatmap(C_CT*100./repmat(sum(C_CT,2),1,size(C_CT,2)),actnames,actnames,'%10.1f%%', 'TickAngle',60,'TickFontSize',10,'colormap','lines','ShowAllTicks',true); title(['\bf Classification Tree (' num2str(perfTest(1)) '%)']); ylabel('Predicted Classes'); subplot(2,3,2); heatmap(C_SVM*100./repmat(sum(C_SVM,2),1,size(C_SVM,2)),actnames,actnames,'%10.1f%%', 'TickAngle',60,'TickFontSize',10,'colormap','lines','ShowAllTicks',true); title(['\bf Multiclass SVM (' num2str(perfTest(2)) '%)']); subplot(2,3,3); heatmap(C_NN*100./repmat(sum(C_NN,2),1,size(C_NN,2)),actnames,actnames,'%10.1f%%', 'TickAngle',60,'TickFontSize',10,'colormap','lines','ShowAllTicks',true); title(['\bf Neural Networks (' num2str(perfTest(5)) '%)']); subplot(2,3,4); heatmap(C_NB*100./repmat(sum(C_NB,2),1,size(C_NB,2)),actnames,actnames,'%10.1f%%', 'TickAngle',60,'TickFontSize',10,'colormap','lines','ShowAllTicks',true); title(['\bf Naive Bayes (' num2str(perfTest(4)) '%)']); ylabel('Predicted Classes'); subplot(2,3,5); heatmap(C_KNN*100./repmat(sum(C_KNN,2),1,size(C_KNN,2)),actnames,actnames,'%10.1f%%', 'TickAngle',60,'TickFontSize',10,'colormap','lines','ShowAllTicks',true); title(['\bf K-Nearest Neighbour (' num2str(perfTest(3)) '%)']); subplot(2,3,6); heatmap(C_NN_red*100./repmat(sum(C_NN_red,2),1,size(C_NB,2)),actnames,actnames,'%10.1f%%', 'TickAngle',60,'TickFontSize',10,'colormap','lines','ShowAllTicks',true); title(['\bf Reduced Neural Networks (' num2str(perfTest(6)) '%)']);
Save models
save Data\Prepared_iPhone_32\TrainedNetwork net netRed mySVM myTree myKNN myNB