如何选择数据集中的某些列生成新数据集

dplyr::select
R语言
数据清理
tidyverse
作者

Shalom

发布日期

2022年9月11日

R基础包

data<-head(mtcars)
subset(data,select = c(mpg,cyl))
                   mpg cyl
Mazda RX4         21.0   6
Mazda RX4 Wag     21.0   6
Datsun 710        22.8   4
Hornet 4 Drive    21.4   6
Hornet Sportabout 18.7   8
Valiant           18.1   6

Tidyverse语法

tidyverse包是一个合集,dplyr就是其中之一。

library(dplyr)
select(data,c(mpg,cyl))
                   mpg cyl
Mazda RX4         21.0   6
Mazda RX4 Wag     21.0   6
Datsun 710        22.8   4
Hornet 4 Drive    21.4   6
Hornet Sportabout 18.7   8
Valiant           18.1   6
data %>% select(mpg,cyl)
                   mpg cyl
Mazda RX4         21.0   6
Mazda RX4 Wag     21.0   6
Datsun 710        22.8   4
Hornet 4 Drive    21.4   6
Hornet Sportabout 18.7   8
Valiant           18.1   6
names(data)
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
[11] "carb"
data %>% select(contains('p')) #选中变量名包含p的列
                   mpg disp  hp
Mazda RX4         21.0  160 110
Mazda RX4 Wag     21.0  160 110
Datsun 710        22.8  108  93
Hornet 4 Drive    21.4  258 110
Hornet Sportabout 18.7  360 175
Valiant           18.1  225 105
data %>% select(starts_with('c')) %>% names() #以c开头
[1] "cyl"  "carb"
data %>% select(ends_with('p')) %>% names() #以p结尾
[1] "disp" "hp"  
data %>% select(matches('.')) %>% names() #选择变量名匹配正则表达式的列
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
[11] "carb"
data %>% select(matches('^c')) %>% names() #以c开头
[1] "cyl"  "carb"
data %>% select(matches('p$')) %>% names() #以p结尾
[1] "disp" "hp"  
data %>% select(matches('mpg')) %>% names() #变量名是mpg
[1] "mpg"
# 反选
data %>% select(contains('p')) %>% names()
[1] "mpg"  "disp" "hp"  
data %>% select(-contains('p')) %>% names()
[1] "cyl"  "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"