模型訓練與測試流程

Fig-1: The First Model

Loading & Preparing Data

Spliting for Classification


Classification Model

## 
## Call:
## glm(formula = buy ~ ., family = binomial(), data = TR[, c(2:9, 
##     11)])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.7931  -0.8733  -0.6991   1.0384   1.8735  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.259e+00  1.261e-01  -9.985  < 2e-16 ***
## r            -1.227e-02  8.951e-04 -13.708  < 2e-16 ***
## s             9.566e-03  9.101e-04  10.511  < 2e-16 ***
## f             2.905e-01  1.593e-02  18.233  < 2e-16 ***
## m            -3.028e-05  2.777e-05  -1.090  0.27559    
## rev           4.086e-05  1.940e-05   2.106  0.03521 *  
## raw          -2.306e-04  8.561e-05  -2.693  0.00708 ** 
## agea29       -4.194e-02  8.666e-02  -0.484  0.62838    
## agea34        1.772e-02  7.992e-02   0.222  0.82456    
## agea39        7.705e-02  7.921e-02   0.973  0.33074    
## agea44        8.699e-02  8.132e-02   1.070  0.28476    
## agea49        1.928e-02  8.457e-02   0.228  0.81962    
## agea54        1.745e-02  9.323e-02   0.187  0.85155    
## agea59        1.752e-01  1.094e-01   1.602  0.10926    
## agea64        6.177e-02  1.175e-01   0.526  0.59904    
## agea69        2.652e-01  1.047e-01   2.533  0.01131 *  
## agea99       -1.419e-01  1.498e-01  -0.947  0.34347    
## areaz106     -4.105e-02  1.321e-01  -0.311  0.75603    
## areaz110     -2.075e-01  1.045e-01  -1.986  0.04703 *  
## areaz114      3.801e-02  1.111e-01   0.342  0.73214    
## areaz115      2.599e-01  9.682e-02   2.684  0.00727 ** 
## areaz221      1.817e-01  9.753e-02   1.863  0.06243 .  
## areazOthers  -4.677e-02  1.045e-01  -0.448  0.65435    
## areazUnknown -1.695e-01  1.232e-01  -1.375  0.16912    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 27629  on 20007  degrees of freedom
## Residual deviance: 23295  on 19984  degrees of freedom
## AIC: 23343
## 
## Number of Fisher Scoring iterations: 5
##        predict
## actual  FALSE TRUE
##   FALSE  3730  873
##   TRUE   1700 2273
## [1] 0.5367304 0.6999767
##                     [,1]
## FALSE vs. TRUE 0.7556038


Regression Model

## 
## Call:
## lm(formula = amount ~ ., data = TR2[, c(2:6, 8:10)])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.02874 -0.23292  0.05011  0.28423  1.45108 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.1651034  0.0501997  23.209  < 2e-16 ***
## r             0.0003390  0.0003138   1.080  0.27999    
## s             0.0002380  0.0003164   0.752  0.45200    
## f             0.0266666  0.0018112  14.723  < 2e-16 ***
## m             0.5165187  0.0375332  13.762  < 2e-16 ***
## rev           0.0240517  0.0363911   0.661  0.50868    
## agea29        0.0471837  0.0252728   1.867  0.06194 .  
## agea34        0.0896806  0.0233116   3.847  0.00012 ***
## agea39        0.1203331  0.0229212   5.250 1.56e-07 ***
## agea44        0.1107960  0.0235428   4.706 2.56e-06 ***
## agea49        0.0649780  0.0244296   2.660  0.00783 ** 
## agea54        0.0838574  0.0266168   3.151  0.00163 ** 
## agea59        0.0395519  0.0310826   1.272  0.20323    
## agea64        0.0059463  0.0325943   0.182  0.85525    
## agea69       -0.0399961  0.0289299  -1.383  0.16685    
## agea99        0.0892782  0.0408041   2.188  0.02870 *  
## areaz106      0.0955455  0.0427171   2.237  0.02533 *  
## areaz110      0.0526326  0.0347075   1.516  0.12944    
## areaz114      0.0154046  0.0364721   0.422  0.67277    
## areaz115      0.0193686  0.0317954   0.609  0.54243    
## areaz221      0.0350306  0.0320485   1.093  0.27440    
## areazOthers   0.0385476  0.0344270   1.120  0.26288    
## areazUnknown  0.0130805  0.0387052   0.338  0.73541    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4249 on 9246 degrees of freedom
## Multiple R-squared:  0.2796, Adjusted R-squared:  0.2779 
## F-statistic: 163.1 on 22 and 9246 DF,  p-value: < 2.2e-16
##   R2train    R2test 
## 0.2795931 0.2845795


製作變數、改進模型

Fig-2: Prediction

Fig-2: Prediction

進行預測

Fig-3: Prediction

Aggregate data 2000-12-01 ~ 2001~02-28.

## [1] 28531

In B, there is a record for each customer. B$Buy is the probability of buying in March.

💡: 預測購買金額時要記得做指數、對數轉換!