r - Manipulating all split data sets -
i'm drawing blank-- have 51 sets of split data data frame had, , want take mean of height of each set.
print(dataset) $`1` id species plant height 1 1 42.7 2 1 32.5 $`2` id species plant height 3 2 43.5 4 2 54.3 5 2 45.7
...
...
...
$`51` id species plant height 134 51 52.5 135 51 61.2
i know how run each individually, 51 split sections, take me ages.
i thought that
mean(dataset[,4])
might work, says have wrong number of dimensions. why incorrect, no closer figuring out how average of heights.
the dataset
list
. use lapply/sapply/vapply
etc loop through list
elements , mean of 'height' column. using vapply
, can specify class
, length
of output (numeric(1))
. useful debugging.
vapply(dataset, function(x) mean(x[,4], na.rm=true), numeric(1)) # 1 2 51 #37.60000 47.83333 56.85000
or option (if have same columnames/number of columns data.frame
s in list
), use rbindlist
data.table
option
idcol=trueto generate single
data.table. '.id' column shows name of the
listelements. group '.id' , the
meanof the
height`.
library(data.table) rbindlist(dataset, idcol=true)[, list(mean=mean(height, na.rm=true)), = .id] # .id mean #1: 1 37.60000 #2: 2 47.83333 #3: 51 56.85000
or similar option above unnest
library(tidyr)
return single dataset '.id' column, grouped '.id', summarise
mean
of 'height'.
library(tidyr) library(dplyr) unnest(dataset, .id) %>% group_by(.id) %>% summarise(mean= mean(height, na.rm=true)) # .id mean #1 1 37.60000 #2 2 47.83333 #3 51 56.85000
the syntax plyr
is
df1 <- unnest(dataset, .id) ddply(df1, .(.id), summarise, mean=mean(height, na.rm=true)) # .id mean #1 1 37.60000 #2 2 47.83333 #3 51 56.85000
data
dataset <- structure(list(`1` = structure(list(id = 1:2, species = c("a", "a"), plant = c(1l, 1l), height = c(42.7, 32.5)), .names = c("id", "species", "plant", "height"), class = "data.frame", row.names = c(na, -2l)), `2` = structure(list(id = 3:5, species = c("a", "a", "a" ), plant = c(2l, 2l, 2l), height = c(43.5, 54.3, 45.7)), .names = c("id", "species", "plant", "height"), class = "data.frame", row.names = c(na, -3l)), `51` = structure(list(id = 134:135, species = c("a", "a" ), plant = c(51l, 51l), height = c(52.5, 61.2)), .names = c("id", "species", "plant", "height"), class = "data.frame", row.names = c(na, -2l))), .names = c("1", "2", "51"))
Comments
Post a Comment