回归 - R语言 - 程序员开发

TOP

回归(二)

2017-10-10 12:07:57 【大中小】浏览:2233次

ale,28.785,0,no,northeast,3385.39915 19,female,28.3,0,yes,southwest,17081.08 52,female,37.4,0,no,southwest,9634.538 32,female,17.765,2,yes,northwest,32734.1863 38,male,34.7,2,no,southwest,6082.405 59,female,26.505,0,no,northeast,12815.44495 61,female,22.04,0,no,northeast,13616.3586 53,female,35.9,2,no,southwest,11163.568 19,male,25.555,0,no,northwest,1632.56445 20,female,28.785,0,no,northeast,2457.21115 22,female,28.05,0,no,southeast,2155.6815 19,male,34.1,0,no,southwest,1261.442 22,male,25.175,0,no,northwest,2045.68525 54,female,31.9,3,no,southeast,27322.73386 22,female,36,0,no,southwest,2166.732 34,male,22.42,2,no,northeast,27375.90478 26,male,32.49,1,no,northeast,3490.5491 34,male,25.3,2,yes,southeast,18972.495 29,male,29.735,2,no,northwest,18157.876
......

执行过程分析：

> insurance <- read.csv("insurance.csv", stringsAsFactors = TRUE)  #读取数据
> str(insurance)     #查看data.frame结构
'data.frame':    1338 obs. of  7 variables:
 $ age     : int  19 18 28 33 32 31 46 37 37 60 ...
 $ sex     : Factor w/ 2 levels "female","male": 1 2 2 2 2 1 1 1 2 1 ...
 $ bmi     : num  27.9 33.8 33 22.7 28.9 ...
 $ children: int  0 1 3 0 0 0 1 3 2 0 ...
 $ smoker  : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
 $ region  : Factor w/ 4 levels "northeast","northwest",..: 4 3 3 2 2 3 3 2 1 2 ...
 $ charges : num  16885 1726 4449 21984 3867 ...> library("psych")    #加载包
> ins_model <- lm(charges ~ age + children + bmi + sex + smoker + region, data=insurance) #使用包的线性回归方法训练数据集
> summary(ins_model) #查看训练集汇总信息

Call:
lm(formula = charges ~ age + children + bmi + sex + smoker + 
    region, data = insurance)

Residuals:
     Min       1Q   Median       3Q      Max 
-11304.9  -2848.1   -982.1   1393.9  29992.8 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -11938.5      987.8 -12.086  < 2e-16 ***   
age                256.9       11.9  21.587  < 2e-16 ***  #*多代表显著特征
children           475.5      137.8   3.451 0.000577 ***
bmi                339.2       28.6  11.860  < 2e-16 ***
sexmale           -131.3      332.9  -0.394 0.693348    
smokeryes        23848.5      413.1  57.723  < 2e-16 ***
regionnorthwest   -353.0      476.3  -0.741 0.458769    
regionsoutheast  -1035.0      478.7  -2.162 0.030782 *  
regionsouthwest   -960.0      477.9  -2.009 0.044765 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6062 on 1329 degrees of freedom
Multiple R-squared:  0.7509,    Adjusted R-squared:  0.7494 
F-statistic: 500.8 on 8 and 1329 DF,  p-value: < 2.2e-16

> lmstep<- step(ins_model) #用于去除不显著的特征
Start:  AIC=23316.43
charges ~ age + children + bmi + sex + smoker + region

           Df  Sum of Sq        RSS   AIC
- sex       1 5.7164e+06 4.8845e+10 23315    #sex特征被删除
<none>                   4.8840e+10 23316
- region    3 2.3343e+08 4.9073e+10 23317
- children  1 4.3755e+08 4.9277e+10 23326
- bmi       1 5.1692e+09 5.4009e+10 23449
- age       1 1.7124e+10 6.5964e+10 23717
- smoker    1 1.2245e+11 1.7129e+11 24993

Step:  AIC=23314.58                        #用AIC最小值来评估
charges ~ age + children + bmi + smoker + region

           Df  Sum of Sq        RSS   AIC
<none>                   4.8845e+10 23315
- region    3 2.3320e+08 4.9078e+10 23315
- children  1 4.3596e+08 4.9281e+10 23325
- bmi       1 5.1645e+09 5.4010e+10 23447
- age       1 1.7151e+10 6.5996e+10 23715
- smoker    1 1.2301e+11 1.7186e+11 24996
> predict.lm(lmstep,data.frame(age=70,children=4,bmi=31.5,smoker='yes',region='northeast'

首页上一页 1 2 3 下一页尾页 2/3/3
【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：R语言--数据预处理	下一篇：R 分类进行数值处理