三星GALAXY智能手机数据分析的准备:Preparation OF Data Analysis.Data from” Samsung Galaxy S smartphone”「建议收藏」

三星GALAXY智能手机数据分析的准备:Preparation OF Data Analysis.Data from” Samsung Galaxy S smartphone”「建议收藏」Thissmy"GettingandCleaningDataCourse"Project.目录1.load thedatainR2.Mergesthetrainingandthetestsetstocreateonedataset.3.Extractsonlythemeasurementsonthemeanandstand…

大家好,欢迎来到IT知识分享网。

This s my “Getting and Cleaning Data Course” Project.

目录

1.load the data in R

2.Merges the training and the test sets to create one data set.

3.Extracts only the measurements on the mean and standard deviation for each measurement.

4.Uses descriptive activity names to name the activities in the data set

5.Appropriately labels the data set with descriptive variable names.

6.From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.


Here are the data for the project:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip


One of the most exciting areas in all of data science right now is wearable computing .Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users. The data linked to from the course website represent data collected from the accelerometers from the Samsung Galaxy S smartphone. 

And this time ,i downloaded the file into my workdir to read the readme.txt easier . If u wanna know sth about the download process in R, u can go to : 用R获得你想要的原始数据-如何下载  to check more detail.


 

1.load the data in R

here,i download the  dataset into my wd already. If u wanna download from R coding ,and wanna know how to do it ,welcome to :how to LOAD the data .

#already set the dataset file as wd
setwd("C:/Users/zhong/Desktop/coursera/R/UCI HAR Dataset")

#load the data
train_x <- read.table("./train/X_train.txt")
train_y <- read.table("./train/y_train.txt")
train_subject <- read.table("./train/subject_train.txt")
test_x <- read.table("./test/X_test.txt")
test_y <- read.table("./test/y_test.txt")
test_subject <- read.table("./test/subject_test.txt")

 

2.Merges the training and the test sets to create one data set.

#combine the data
trainData <- cbind(train_subject, train_y, train_x)
testData <- cbind(test_subject, test_y, test_x)

#merge the train and test data
MergeData <- rbind(trainData, testData)

 

3.Extracts only the measurements on the mean and standard deviation for each measurement.

#Extract only the measurements on the mean and standard deviation for each measurement. 
##get the feature of the data
Feature <- read.table("./features.txt", stringsAsFactors = FALSE)[,2]

##add feature into the data
FeatureIndex <- grep(("mean\\(\\)|std\\(\\)"), Feature)
DATA <- MergeData[, c(1, 2, FeatureIndex+2)]
colnames(DATA) <- c("subject", "activity", Feature[FeatureIndex])


4.Uses descriptive activity names to name the activities in the data set

#Uses descriptive activity names to name the activities in the data set
## get activity name
ActivityName <- read.table("./activity_labels.txt")

##replace activity names
DATA$activity <- factor(DATA$activity, levels = ActivityName[,1], labels = ActivityName[,2])

5.Appropriately labels the data set with descriptive variable names.

#Appropriately labels the data set with descriptive variable names.

names(DATA) <- gsub("\\()", "", names(DATA))
names(DATA) <- gsub("^t", "time", names(DATA))
names(DATA) <- gsub("^f", "frequence", names(DATA))
names(DATA) <- gsub("-mean", "Mean", names(DATA))
names(DATA) <- gsub("-std", "Std", names(DATA))

 

6.From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

#From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
library(plyr)
tidyData<-aggregate(. ~subject + activity, DATA, mean)
tidyData<-tidyData[order(tidyData$subject,tidyData$activity),]

#save the data which s clean and tidy
write.table(tidyData, file = "tidyData.txt",row.name=FALSE)

 more info. and code update :https://github.com/kidpea/Preparation-OF-Data-Analysis.Data-from-Samsung-Galaxy-S-smartphone-/blob/master/run_analysis.R

免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://yundeesoft.com/24685.html

(0)

相关推荐

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注

关注微信