R语言dataframe的常用操作总结 - R语言

TOP

R语言dataframe的常用操作总结(三)

2019-08-15 00:09:20 【大中小】浏览:221次

6 13 11 4 1 90 49 87 14 12 7 1 94 38 87

若想由大到小排序，使用desc()函数

 1 > arrange(df,desc(Chinese))  #按语文成绩由大到小排序
 2    ID Class Chinese Math English
 3 1   7     1      94   38      87
 4 2   4     1      90   49      87
 5 3   6     3      89   99      46
 6 4   8     2      66   77      95
 7 5   1     2      65   59      23
 8 6   3     3      65   76      67
 9 7   9     3      62   93      43
10 8   2     2      37   38      45
11 9   5     2      20   71      34
12 10 10     1      20   21      76
13 11 11     2      20   65      23
14 12 12     3      20   12      94

2、distinct()函数去重

distinct(.data, ..., .keep_all = FALSE)

 1 > df1 <- df[rep(1:nrow(df),each = 2),] #将df每行复制1次
 2 > df1
 3      ID Class Chinese Math English
 4 1     1     2      65   59      23
 5 1.1   1     2      65   59      23
 6 2     2     2      37   38      45
 7 2.1   2     2      37   38      45
 8 3     3     3      65   76      67
 9 3.1   3     3      65   76      67
10 4     4     1      90   49      87
11 4.1   4     1      90   49      87
12 5     5     2      20   71      34
13 5.1   5     2      20   71      34
14 6     6     3      89   99      46
15 6.1   6     3      89   99      46
16 7     7     1      94   38      87
17 7.1   7     1      94   38      87
18 8     8     2      66   77      95
19 8.1   8     2      66   77      95
20 9     9     3      62   93      43
21 9.1   9     3      62   93      43
22 10   10     1      20   21      76
23 10.1 10     1      20   21      76
24 11   11     2      20   65      23
25 11.1 11     2      20   65      23
26 12   12     3      20   12      94
27 12.1 12     3      20   12      94
28 > df1 <- distinct(df1)  #去除重复的行
29 > df1
30    ID Class Chinese Math English
31 1   1     2      65   59      23
32 2   2     2      37   38      45
33 3   3     3      65   76      67
34 4   4     1      90   49      87
35 5   5     2      20   71      34
36 6   6     3      89   99      46
37 7   7     1      94   38      87
38 8   8     2      66   77      95
39 9   9     3      62   93      43
40 10 10     1      20   21      76
41 11 11     2      20   65      23
42 12 12     3      20   12      94

3、group_by()函数分组 summarise()函数概括

group_by(.data, ..., add = FALSE, .drop = FALSE)

ungroup(x, ...)

summarise(.data, ...)

group_by()与summarise()函数常连用，用于对不同的分组进行操作,在这里再介绍一个管道函数“%>%”，其作用是把左件的值发送给右件的表达式，并作为右件表达式函数的第一个参数

1 > df %>%
2 +   group_by(Class) %>%
3 +   summarise(max = max(Chinese)) #求出按Class分组每组中语文成绩最高分
4 # A tibble: 3 x 2
5   Class   max
6   <dbl> <dbl>
7 1     1    94
8 2     2    66
9 3     3    89

4、filter()函数筛选

filter(.data, ..., .preserve = FALSE)

选出符合条件的行(返回数据框格式)

 1 > df %>%
 2 +   group_by(Class) %>%
 3 +   filter(Chinese == max(Chinese))  #选出每个班语文成绩最高的学生的信息
 4 # A tibble: 3 x 5
 5 # Groups:   Class [3]
 6      ID Class Chinese  Math English
 7   <dbl> <dbl>   <dbl> <dbl>   <dbl>
 8 1     6     3      89    99      46
 9 2     7     1      94    38      87
10 3     8     2      66    77      95

5、select()函数选择

select(.data, ...)

 1 > select(df,ID,Chinese,Math,English) #选出df中ID、语文、数学、英语数据
 2    ID Chinese Math English
 3 1   1      65   59      23
 4 2   2      37   38      45
 5 3   3      65   76      67
 6 4   4      90   49      87
 7 5   5      20   71      34
 8 6   6      89   99      46
 9 7   7      94   38      87
10 8   8      66   77      95
11 9   9      62   93      43
12 10 10      20   21      76
13 11 11      20   65      23
14 12 12      20   12      94

6、rbind()函数与cbind()函数合并

rbind()函数根据行进行合并，cbind()根据列进行合并

 1 #新建数据框df1
 2 > df1 <- data.frame(ID = 13,Class = 2,
 3 Chinese = 65,Math = 26,English = 84)
 4 > df1
 5   ID Class Chinese Math English
 6 1 13     2      65   26      84
 7 > rbind(df,df1)  #合并df与df1
 8    ID Class Chinese Math English
 9 1   1     2      65   59      23
10 2   2     2      37   38      45
11 3   3     3      65   76      67
12 4   4     1      90   49      87
13 5   5     2      20   71      34
14 6   6     3      89   99      46
15 7   7     1      94   38      87
16 8   8     2      66   77      95
17 9   9     3      62   93      43
18 10 10     1      20   21      76
19 11 11     2      20   65      23
20 12 12     3      20   12      94
21 13 13     2      65   26      84
22 > df2 #新建数据框df2
23    Biological
24 1          65
25 2          15
26 3          35
27 4          59
28 5          64
29 6          34
30 7          29
31 8          46
32 9          32
33 10         95
34 11         46
35 12         23
36 > cbind(df,df2)  #合并df与df2
37    ID Class Chinese Math English Biological
38 1   1     2      65   59      23         65
39 2   2     2      37   38      45         15
40 3   3     3      65   76      67         35
41 4   4     1      90   49      87         59
42 5   5     2      20   71      34         64
43 6   6     3      89   99      46         34
44 7   7     1      94   38      87         29
45 8   8     2      66   77      95         46
46 9   9     3      62   93      43         32
47 10 10     1      20   21      76         95
48 11 11     2      20   65      23         46
49 12 12     3      20   12      94         23

7、join函数连接

inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"),...)

left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)

right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x&qu

首页上一页 1 2 3 4 下一页尾页 3/4/4
【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：D02-R语言基础学习	下一篇：R数据分析（一）