# $ diff.maximum <dbl> 2678400
你可以看到一些重要的特征,例如起始、结束、单位等等。还有时间差的分位数(相邻两个观察之间差距的秒数),这对评估规律性的程度很有用。由于时间尺度是月度的,因此每个月之间差距的秒数并不规则。
STEP 1:扩充时间序列签名
tk_augment_timeseries_signature()
函数将时间戳信息逐列扩展到机器学习特征集中,并将时间序列信息列添加到初始数据表。
# Augment (adds data frame columns)
beer_sales_tbl_aug <- beer_sales_tbl %>%
tk_augment_timeseries_signature()
beer_sales_tbl_aug
## # A tibble: 84 x 30
## date price index.num diff year year.iso half quarter
## <date> <int> <int> <int> <int> <int> <int> <int>
## 1 2010-01-01 6558 1262304000 NA 2010 2009 1 1
## 2 2010-02-01 7481 1264982400 2678400 2010 2010 1 1
## 3 2010-03-01 9475 1267401600 2419200 2010 2010 1 1
## 4 2010-04-01 9424 1270080000 2678400 2010 2010 1 2
## 5 2010-05-01 9351 1272672000 2592000 2010 2010 1 2
## 6 2010-06-01 10552 1275350400 2678400 2010 2010 1 2
## 7 2010-07-01 9077 1277942400 2592000 2010 2010 2 3
## 8 2010-08-01 9273 1280620800 2678400 2010 2010 2 3
## 9 2010-09-01 9420 1283299200 2678400 2010 2010 2 3
## 10 2010-10-01 9413 1285891200 2592000 2010 2010 2 4
## # ... with 74 more rows, and 22 more variables: month <int>,
## # month.xts <int>, month.lbl <ord>, day <int>, hour <int>,
## # minute <int>, second <int>, hour12 <int>, am.pm <int>,
## # wday <int>, wday.xts <int>, wday.lbl <ord>, mday <int>,
## # qday <int>, yday <int>, mweek <int>, week <int>, week.iso <int>,
## # week2 <int>, week3 <int>, week4 <int>, mday7 <int>
STEP 2:模型
任何回归模型都可以应用于数据,我们在这里使用 lm()
。 请注意,我们删除了 date
和 diff
列。大多数算法无法使用日期数据,而 diff
列对机器学习没有什么用处(它对于查找数据中的时间间隔更有用)。
# linear regression model used, but can use any model
fit_lm <- lm(
price ~ .,
data = select(
beer_sales_tbl_aug,
-c(date, diff)))
summary(fit_lm)
##
## Call:
## lm(formula = price ~ ., data = select(beer_sales_tbl_aug, -c(date,
## diff)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -447.3 -145.4 -18.2 169.8 421.4
##
## Coefficients: (16 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.660e+08 1.245e+08 2.940 0.004738 **
## index.num 5.900e-03 2.003e-03 2.946 0.004661 **
## year -1.974e+05 6.221e+04 -3.173 0.002434 **
## year.iso 1.159e+04 6.546e+03 1.770 0.082006 .
## half -2.132e+03 6.107e+02 -3.491 0.000935 ***
## quarter -1.239e+04 2.190e+04 -0.566 0.573919
## month -3.910e+03 7.355e+03 -0.532 0.597058
## month.xts NA NA NA NA
## month.lbl.L NA NA NA NA
## month.lbl.Q -1.643e+03 2.069e+02 -7.942 8.59e-11 ***
## month.lbl.C 8.368e+02 5.139e+02 1.628 0.108949
## month.lbl^4 6.452e+02 1.344e+02 4.801 1.18e-05 ***
## month.lbl^5 7.563e+02 4.241e+02 1.783 0.079852 .
## month.lbl^6 3.206e+02 1.609e+02 1.992 0.051135 .
## month.lbl^7 -3.537e+02 1.867e+02 -1.894 0.063263 .
## month.lbl^8 3.687e+02 3.217e+02 1.146 0.256510
## month.lbl^9 NA NA NA NA
## month.lbl^10 6.339e+02 2.240e+02 2.830 0.006414 **
## month.lbl^11 NA NA NA NA
## day NA NA NA NA
## hour NA NA NA NA
## minute NA NA NA NA
## second NA NA NA NA
## hour12 NA NA NA NA
## am.pm NA NA NA NA
## wday -8.264e+01 1.898e+01 -4.353 5.63e-05 ***
## wday.xts NA NA NA NA
## wday.lbl.L NA NA NA NA
## wday.lbl.Q -7.109e+02 1.093e+02 -6.503 2.13e-08 ***
## wday.lbl.C 2.355e+02 1.336e+02 1.763 0.083273 .
## wday.lbl^4 8.033e+01 1.133e+02 0.709 0.481281
## wday.lbl^5 6.480e+01 8.029e+01 0.807 0.422