大家好,欢迎来到IT知识分享网。
This s my “Getting and Cleaning Data Course” Project.
目录
2.Merges the training and the test sets to create one data set.
3.Extracts only the measurements on the mean and standard deviation for each measurement.
4.Uses descriptive activity names to name the activities in the data set
5.Appropriately labels the data set with descriptive variable names.
Here are the data for the project:
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
One of the most exciting areas in all of data science right now is wearable computing .Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users. The data linked to from the course website represent data collected from the accelerometers from the Samsung Galaxy S smartphone.
And this time ,i downloaded the file into my workdir to read the readme.txt easier . If u wanna know sth about the download process in R, u can go to : 用R获得你想要的原始数据-如何下载 to check more detail.
1.load the data in R
here,i download the dataset into my wd already. If u wanna download from R coding ,and wanna know how to do it ,welcome to :how to LOAD the data .
#already set the dataset file as wd
setwd("C:/Users/zhong/Desktop/coursera/R/UCI HAR Dataset")
#load the data
train_x <- read.table("./train/X_train.txt")
train_y <- read.table("./train/y_train.txt")
train_subject <- read.table("./train/subject_train.txt")
test_x <- read.table("./test/X_test.txt")
test_y <- read.table("./test/y_test.txt")
test_subject <- read.table("./test/subject_test.txt")
2.Merges the training and the test sets to create one data set.
#combine the data
trainData <- cbind(train_subject, train_y, train_x)
testData <- cbind(test_subject, test_y, test_x)
#merge the train and test data
MergeData <- rbind(trainData, testData)
3.Extracts only the measurements on the mean and standard deviation for each measurement.
#Extract only the measurements on the mean and standard deviation for each measurement.
##get the feature of the data
Feature <- read.table("./features.txt", stringsAsFactors = FALSE)[,2]
##add feature into the data
FeatureIndex <- grep(("mean\\(\\)|std\\(\\)"), Feature)
DATA <- MergeData[, c(1, 2, FeatureIndex+2)]
colnames(DATA) <- c("subject", "activity", Feature[FeatureIndex])
4.Uses descriptive activity names to name the activities in the data set
#Uses descriptive activity names to name the activities in the data set
## get activity name
ActivityName <- read.table("./activity_labels.txt")
##replace activity names
DATA$activity <- factor(DATA$activity, levels = ActivityName[,1], labels = ActivityName[,2])
5.Appropriately labels the data set with descriptive variable names.
#Appropriately labels the data set with descriptive variable names.
names(DATA) <- gsub("\\()", "", names(DATA))
names(DATA) <- gsub("^t", "time", names(DATA))
names(DATA) <- gsub("^f", "frequence", names(DATA))
names(DATA) <- gsub("-mean", "Mean", names(DATA))
names(DATA) <- gsub("-std", "Std", names(DATA))
6.From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
#From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
library(plyr)
tidyData<-aggregate(. ~subject + activity, DATA, mean)
tidyData<-tidyData[order(tidyData$subject,tidyData$activity),]
#save the data which s clean and tidy
write.table(tidyData, file = "tidyData.txt",row.name=FALSE)
more info. and code update :https://github.com/kidpea/Preparation-OF-Data-Analysis.Data-from-Samsung-Galaxy-S-smartphone-/blob/master/run_analysis.R
免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://yundeesoft.com/24685.html