r - Manipulating all split data sets -

i'm drawing blank-- have 51 sets of split data data frame had, , want take mean of height of each set.

print(dataset) $`1` id   species  plant   height  1             1      42.7 2             1      32.5  $`2` id   species  plant   height  3             2      43.5 4             2      54.3 5             2      45.7

...

$`51` id   species  plant   height 134           51     52.5 135           51     61.2

i know how run each individually, 51 split sections, take me ages.

i thought that

mean(dataset[,4])

might work, says have wrong number of dimensions. why incorrect, no closer figuring out how average of heights.

the dataset list. use lapply/sapply/vapply etc loop through list elements , mean of 'height' column. using vapply, can specify class , length of output (numeric(1)). useful debugging.

vapply(dataset, function(x) mean(x[,4], na.rm=true), numeric(1)) #     1        2       51  #37.60000 47.83333 56.85000

or option (if have same columnames/number of columns data.frames in list), use rbindlist data.table optionidcol=trueto generate singledata.table. '.id' column shows name of thelistelements. group '.id' , themeanof theheight`.

library(data.table) rbindlist(dataset, idcol=true)[, list(mean=mean(height, na.rm=true)), = .id] #   .id     mean #1:   1 37.60000 #2:   2 47.83333 #3:  51 56.85000

or similar option above unnest library(tidyr) return single dataset '.id' column, grouped '.id', summarise mean of 'height'.

library(tidyr) library(dplyr) unnest(dataset, .id) %>%           group_by(.id) %>%            summarise(mean= mean(height, na.rm=true)) # .id     mean #1   1 37.60000 #2   2 47.83333 #3  51 56.85000

the syntax plyr is

df1 <- unnest(dataset, .id) ddply(df1, .(.id), summarise, mean=mean(height, na.rm=true)) # .id     mean #1   1 37.60000 #2   2 47.83333 #3  51 56.85000

data

dataset <- structure(list(`1` = structure(list(id = 1:2, species = c("a",  "a"), plant = c(1l, 1l), height = c(42.7, 32.5)), .names = c("id",  "species", "plant", "height"), class = "data.frame", row.names = c(na,  -2l)), `2` = structure(list(id = 3:5, species = c("a", "a", "a" ), plant = c(2l, 2l, 2l), height = c(43.5, 54.3, 45.7)), .names = c("id",  "species", "plant", "height"), class = "data.frame", row.names = c(na,  -3l)), `51` = structure(list(id = 134:135, species = c("a", "a" ), plant = c(51l, 51l), height = c(52.5, 61.2)), .names = c("id",  "species", "plant", "height"), class = "data.frame", row.names = c(na,  -2l))), .names = c("1", "2", "51"))

Search This Blog

Premier

r - Manipulating all split data sets -

data

Comments

Post a Comment

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -